We had a problem with our Netapp this morning that could potentially be quite serious. One drive near the beginning of the chain (ID 2, I believe) was failed out by the Netapp. Very shortly thereafter, the filer crashed with a RAID panic and rebooted. Upon rebooting, it noticed that drive ID 2 was not actively being used, and proceeded to add it to the hot spare pool. Then it began reconstructing the data on to (you guessed it) drive ID 2.
In this scenario, there was no time to pull out the bad drive, and the Netapp happily rebuilt the data on it. I guess the correct procedure now is to forcibly fail that drive and rebuild to our good spare drive, and remove drive ID 2. Could the Netapp somehow mark a bad drive so that the information is kept across boots?