Below has worked many times on our F740C nodes and the data on both nodes is accessible the entire time (well outside of the takeover and giveback):
- Manually takeover the problem filer from the working filer. (cf takeover) - Power off and unplug everything (cluster cables, FCAL cables, everything) from the problem filer head (only the head, not the disk shelves). - Replace whatever needs replacing. - Connect everything except the cluster connection. I can't remember if I left the FCAL cable to the partners disks disconnected as well, but I think I did just to be cautious. I do know I keep the FCAL connection to it's own disks connected. - Boot the problem filer with the diagnostic disk and run all diagnostics. As long as you cleanly shutdown you can do NVRAM tests, and as long as the cluster is not connected you can run all the MB tests. You can run the memory tests with everything connected. - After diagnostics run, take out the diag disk and reboot the filer. It should stop at the "waiting for giveback..." statement. - Do a cf giveback on the working filer.
I've done this about 10 times on a F740C and F880C and have never had a problem with data not being available, or becomming corrupted. Just for the record, data center power outages cause most of these failures, not the NetApps.
Jeff
From: Geoff Hardin geoff.hardin@dalsemi.com To: toasters toasters@mathworks.com Subject: Cluster failover question Date: Mon, 23 Feb 2004 16:04:49 -0600
We've always had cluster failover on our filers for those times when something goes wrong on one filer and the other filer can serve the data. Realistically speaking, that rarely happens because the filers are stable. However, as time passes and the filers grow older, the develop more and more "personality."
For example, we have a pair of F760s that are our problem children. As much as we'd like to pawn them off to another group, put them out to pasture, or replace them outright, that does not appear to be happening in the near future. Unfortunately, one of them has now developed a problem with a memory DIMM on the motherboard. In the past, we've had the luxury of being able to shut down the clustered pair of filers, but in our price conscious environment, people are asking what is the point of clustering if we can't do maintenance and keep the data available.
So, my question is this: is it possible to work on a filer head while serving the data up from the cluster partner? My concern is that you are upsetting the FC-AL integrity because we'll have to unplug the FC-AL cables from the adapters on the head when we pull the motherboard tray out. Then, since the recommended course of action is to run diagnostics after reseating and/or replacing the DIMMs, could we run a small set of the diagnostics before plugging in the FC-AL cables? Maybe we could / should use the FC-AL reset function from the diagnostics menu to get the loops back to normal?
Maybe we've just been too cautious with our data, but I'd like to hear from other toasters if this is possible, advisable, and safe before putting our data at risk.
Thanks,
Geoff
-- Geoff Hardin geoff.hardin@dalsemi.com Put on your seatbelt. I wanna try something.
_________________________________________________________________ Find and compare great deals on Broadband access at the MSN High-Speed Marketplace. http://click.atdmt.com/AVE/go/onm00200360ave/direct/01/