We had a problem with our Netapp this morning that could
potentially be quite serious. One drive near the beginning of the
chain (ID 2, I believe) was failed out by the Netapp. Very shortly
thereafter, the filer crashed with a RAID panic and rebooted. Upon
rebooting, it noticed that drive ID 2 was not actively being used, and
proceeded to add it to the hot spare pool. Then it began
reconstructing the data on to (you guessed it) drive ID 2.
In this scenario, there was no time to pull out the bad drive, and
the Netapp happily rebuilt the data on it. I guess the correct
procedure now is to forcibly fail that drive and rebuild to our good
spare drive, and remove drive ID 2. Could the Netapp somehow mark a
bad drive so that the information is kept across boots?
--
Brian Tao (BT300, taob(a)netcom.ca)
"Though this be madness, yet there is method in't"