From Will Partain on Fri, 18 Feb 2000 18:46:53 GMT:
Old NetApp F220. One disk died. Called support guys; they sent another disk. Typed 'disk swap', swapped a disk, typed 'disk unswap'.
This isn't the correct procedure. To swap a disk, type "disk swap". You may then remove a *single* disk. Wait at least 30 seconds for the disk unit status check to complete, or until you see confirmation of that in the /etc/messages file. Now type "disk swap" again. You may now insert a single disk.
You only use "disk unswap" when you have previously issued a "disk swap" command, but have decided not to add or remove a disk from the system. The "disk unswap" allows SCSI bus to resume communications.
When I rebooted, it failed to load the OS; error "Invalid opcode" (i.e. it read junk off the disk).
I'm not familiar with this particular boot-time gotcha, but it sounds consistent with getting the disks mixed up and possibly not issuing both of the required "disk swap" commands.
So I rebooted from floppies (after swapping the disks back around correctly), and that seemed cool -- it figured out which disk was what, and did all the necessary RAID reconstruction. Everything looked OK.
Booting from floppies is a good idea at that point.
What I *think* I could've done was: reboot from floppies; splat the ontap stuff into /vol/vol0/etc from afar; typed 'download' at the filer, and it might've worked. (But would I have had reason to trust any of the ordinary RAID data at this point?)
Ehe. If you can't boot the kernel off the SCSI disks, but you can off floppies then you have ...
1) ... a signaling problem on the SCSI bus - could be a bad cable, host adapter, shelf, or disk. 2) ... a horked set of boot blocks on your SCSI disks
In this situation, just running download from the console will update the boot block image on your disks from the currently installed OS in /etc.
If you can boot the kernel off SCSI disks, but you can't load your root volume then you've got (it will PANIC and scream this at you) an inconsistent volume - you're missing more than one disk.
If you spot a place where I (+ support guy) went badly wrong, other than what I've outlined, I'd like to know.
Depending on where you called support, they should have been able to walk you through the disk swapping procedure. Its not brain surgery, but not following the instructions explicitly can lead to a "disaster". Unfortunately, you just learned that the hard way. =(
As a tip - label the hell out of your shelves. You never know when the amber failure light won't light up on the problem disk or if its a 3AM swap when your brain just isn't in gear. Our filers are labeled to the extreme and probably ISO 9001 compliant. =) Its tedious, but we've found that eliminating the easy mistakes is worth the effort.
Good luck!
-- Jeff
-- ---------------------------------------------------------------------------- Jeff Krueger E-Mail: jeff@qualcomm.com NetApp File Server Lead Phone: 858-651-6709 IT Engineering and Support Fax: 858-651-6627 QUALCOMM, Incorporated Web: www.qualcomm.com