Netapp clustering and NFS failover - toasters

16 Feb 2003


      I'm evaluating NFS server appliances to be used by a couple of dozen clients 
(currently all solaris, but possibly linux and hpux in future) all running a 
distributed application.  We're looking at how clusters handle failing over 
NFS serving from one netapp to another.
From the research we've done on netapp and other vendors, it looks like it 
takes about 20 seconds for the nfs server daemon to come back, but the thing 
that kills many vendors is that client lock recovery is slow and/or buggy.  
The lockd grace period is tunable down from 45s on most servers, but that 
still means you are down for 65seconds, and that's pretty painful.  And 
that's assuming that the server implementation is clever enough to correctly 
transfer knowledge about locks from the dead node to the active server node, 
something which we have seen not to happen on at least one major vendor's 
implementation.
Has anyone share their experiences of failing over clustered netapps for 
nfs?  What kind of failover times do you see (including lock recovery)?  
Does anything not work properly?
tia
--herb
_________________________________________________________________
Protect your PC - get McAfee.com VirusScan Online  
http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963