All the drives on the shelf disappeared from the system when it "wedged" so they were marked failed (or just missing). When I initially made the volumes, I organized raid groups such that there's only one drive from any given raid group on a shelf. So with 12 disks in a shelf and 7 shelves in the system (for example, on the R100) I have my raidgroup size set to 7 and make sure that when the volumes were created or expanded that the disks were added in groups of 7 by name, one from each shelf. I organized it that way exactly for this failure case - losing a whole shelf. When the drives disappeard there weren't enough spares to rebuild (until we powercycled the shelf) so most of the raid groups were running one disk short, but they didn't go offline since they only had one missing disk. Because the volumes didn't go offline, the disks couldn't be reassimilated into the system after the reboot because they were now out of date, so they all became spares. So basically the system has to rebuild one drive from each of the 12 raid groups on the system - two at a time. The netapp folks chalked it up to a bug and I updated them to 6.5.2. Luckily the systems are internally used so the downtime wasn't a huge deal.
Not something I want to recur on a regular basis, and with data on the volumes it was a bit of a heart-stopper when it started happening. Lukcy for me I suppose there was no data lost.
Simon.
-----Original Message----- From: Michael Christian [mailto:mchristi@yahoo-inc.com] Subject: RE: New Simplified Monitoring Tool
How did a shelf hang result in 10 failed drives? And how did you avoid a double disk failure with that many failures?
---
This email message and any attachments are for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient or his/her representative, please contact the sender by reply email and destroy all copies of the original message.