----- Original Message ----- From: Jeff Krueger jkrueger@qualcomm.com To: Will Partain partain@mekb2.sps.mot.com Cc: toasters@mathworks.com Sent: Friday, February 18, 2000 1:37 PM Subject: Re: NetApp disk replacement "disaster": post-mortem
From Will Partain on Fri, 18 Feb 2000 18:46:53 GMT:
Old NetApp F220. One disk died. Called support guys; they sent another disk. Typed 'disk swap', swapped a disk, typed 'disk unswap'.
This isn't the correct procedure. To swap a disk, type "disk swap". You may then remove a *single* disk. Wait at least 30 seconds for the disk unit status check to complete, or until you see confirmation of that in the /etc/messages file. Now type "disk swap" again. You may now insert a single disk.
I would just like to say that this is a fairly ultra-safe way to do it, and you should be able to get away with just on disk swap command, then pull the bad disk and plug the new one in. At least, that used to work fine. What does the manul say? Moreover, why didn't you follow what the manual says, Will?
When I rebooted, it failed to load the OS; error "Invalid opcode" (i.e. it read junk off the disk).
I'm not familiar with this particular boot-time gotcha, but it sounds consistent with getting the disks mixed up and possibly not issuing both of the required "disk swap" commands.
I agree, although I suspect it was just trying to read off the new disk. If he pulled the disk he just put in, I bet there is a good chance the filer would have booted fine.
So I rebooted from floppies (after swapping the disks back around correctly), and that seemed cool -- it figured out which disk was what, and did all the necessary RAID reconstruction. Everything looked OK.
Booting from floppies is a good idea at that point.
Absolutely. At that point, he did the right thing about checking the disks and making sure reconstruction was back on track, but he never typed "download" again. (Although I'm not sure why this wasn't done as part of the reconstruction process.)
Depending on where you called support, they should have been able to walk you through the disk swapping procedure. Its not brain surgery, but not following the instructions explicitly can lead to a "disaster". Unfortunately, you just learned that the hard way. =(
Yeah, I have to wonder if you got a bad support person, or if you did not actually tell them what exactly happened. He said at one point he said "never mind"; if he meant that literally, he didn't give the support person time to tell him to type download and everything would be fine again.
Bruce