Maintenance Garage is a new functionality (introduced in 7.0.4 I believe) that helps to remove drives when they aren't functioning properly and trying to recover them without outright failing them. It is also supposed to allow operations to continue to all other drives in the raidgroup and when\if the drive is recoverable only update the changes that the one questionable drive has missed.
I'm paraphrasing quite a bit, but I recall that to be the gist. Either way, pretty cool stuff.
Glenn
-----Original Message----- From: Willeke, Jochen [mailto:Jochen.Willeke@wincor-nixdorf.com] Sent: Wednesday, May 31, 2006 9:51 AM To: Glenn Walker; toasters@mathworks.com Subject: RE: broken disk becomes spare after reconstruct
Hi toasters,
we are running 7.0.1R1.
BTW: what is the maintenance garage? Never heard of this.
Regards
Jochen
-----Original Message----- From: Glenn Walker [mailto:ggwalker@mindspring.com] Sent: Wednesday, May 31, 2006 3:44 PM To: Willeke, Jochen; toasters@mathworks.com Subject: RE: broken disk becomes spare after reconstruct
Basically it appears that the disk started to work correctly again (ie, it was responsive to commands). At that point, it will return it to spare pool...
Most of the time disks don't actually physically fail - ONTAP is very concerned about data consistency so it will fail misbehaving drives so that it no longer uses them. Sometimes the drives start behaving again, and can be used...
ATA drives are very bad for this: they typically go into internal recovery procedures and don't communicate status to the controller\OS - FCAL drives are very good about communicating status and continuing with other operations in the meantime. I've typically seen this behavior on R200's more than FAS-series (with FCAL drives).
Out of curiosity - are you utilizing 7.0.4 or newer? Wondering if maintenance garage would help...
Glenn
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Willeke, Jochen Sent: Wednesday, May 31, 2006 8:50 AM To: toasters@mathworks.com Subject: broken disk becomes spare after reconstruct
Hello Toasters :-)
i have seen something strange on one of our R200. Last week i got an error message with the following output:
[scsi.cmd.notReadyCondition:notice]: Device 2b.35: Device returns not yet ready: CDB 0x28:18974bb9:0009: Sense Data SCSI:not ready - (0x2 - 0x4 0x7 0x2a)(65103).
[scsi.cmd.checkCondition:error]: Device 2b.35: Check Condition: CDB 0x28:154efb9d:022e: Sense Data SCSI:not ready - (0x2 - 0x4 0x7 0x2a)(209311).
[raid.config.filesystem.disk.recovering:error]: Attempting to bring file system Disk /aggr0/plex0/rg1/2b.35 Shelf 2 Bay 3 [NETAPP X266_MTOMC320PAD R5VV] S/N [A81K4MBE] back into service.
......normal procedure of taking one spare and reconstruct the raid group. But then comes what i am wondering about.....
[raid.assim.disk.spare:notice]: Sparing Disk /2b.35 Shelf 2 Bay 3 [NETAPP X266_MTOMC320PAD R5VV] S/N [A81K4MBE], because volume is online and complete
The System takes the broken disk as a spare?!? How can this be?
Perhaps anybody can help because NOW does not explain this situation deep enough to make it understandable to me :D
Regards
Jochen