Re: A netapp failover possibility...???

25 Sep 1998


      Stephen;
An excellent understanding on how CF works.
We're looking forward to working with you on
your implementation.
Regards
Lew Kirschner
-------------
At 11:08 AM 9/18/1998 -0400, Stephen C. Losen wrote:
...
At the www.netapp.com web site is a white paper on netapp's clustered
failover (CF) design.  We have two F630s that we plan to convert to CF. 
Netapp has not released CF yet, but we have read the white paper and had
some of our questions answered by tech folks.
Clustered failover connects two filers to the same stack of disk shelves. 
Each filer has its own set of volumes built on its own set of disks, so
during normal operation, the cluster behaves like (and has the performance
of) two separate filers.  If one filer fails, the healthy filer takes over
the failed filer's volumes and starts serving them.
It looks like a failover will be about as disruptive as a reboot, i.e.,
NFS mounts via udp will survive, but all CIFS connections will be lost (on
the failed filer).  Presumably the healthy filer will not have its service
interrupted.  Failover will take a few minutes, so it will appear as if
the failed filer simply rebooted.
Once the failed hardware is repaired, I'm pretty sure that going back
to normal operation requires rebooting the healthy filer, because
it has to "offline" the volumes that it took over and that requires
a reboot.
So CF doesn't eliminate service disruptions.  It just means that
certain hardware failures become no more disruptive than one
unscheduled reboot and one scheduled reboot.  Plus you have poorer
performance while one filer does the work of two.
Designing CF so that there are absolutely no service disruptions
might very well require one filer to be in "standby" mode at all
times and do no work during normal operations.  Since hardware
failures are rare, this is wasteful.
The key to clustered failover is that each filer uses only half of its own
NVRAM and mirrors its NVRAM state onto the unused half of its partner's
NVRAM.  So in the case of a hardware failure, the healty filer takes over
the volumes of the failed filer, takes over the IP address(es) of the
failed filer, and starts serving the volumes.  The takeover procedure is
similar to when a standalone filer recovers from an abrupt power outage. 
The filesystem resumes at the latest WAFL consistency point, and the NVRAM
log is replayed to perform transactions that happened after the
consistency point.
Clustered failover requires fibre channel disks because you can't hook
SCSI disk shelves up to multiple filers.  Each fibre channel disk shelf in
a CF system has two fibre channel interfaces, and each filer has its own
daisy chain linking it to all the shelves.  During normal operation each
disk is "owned" by one filer.  It appears that disks from the same shelf
can be owned by different filers, so you can add a disk to any shelf and
assign it to either filer.  I don't know if hot spares can be left
unassigned, so it may be necessary for each filer to have its own hot
spare(s) assigned to it.  In the event of a failover, the healthy filer
takes over all the disks.
Netapp's CF supports only two filers.  This makes sense since each disk
shelf must have a fibre channel interface for each filer in the cluster. 
This puts a rather severe limit on the theoretical size of a cluster using
this hardware.
Steve Losen   scl@virginia.edu    phone: 804-924-0640
University of Virginia               ITC Unix Support
NETWORK  APPLIANCE
======================================================================
    FAST!  SIMPLE!   RELIABLE!  MULTIPROTOCOL!!
**********************************************************************
Lew Kirschner - Eastern Area     V-Mail: 732 603 7330
                 Reseller Mgr      (V-Mail Only)
Network Appliance	           	   Reach #: 914-369-3830	    
35 Sagamore Avenue         	   Fax #:   914-369-3832
Suffern, New York 10901          e-mail: lewisk@netapp.com	
                               http://www.netapp.com
**********************************************************************
The Market Leader in Network Attached Storage for
 Multiprotocol Environments!

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: A netapp failover possibility...???