It's been a long time since I've dealt with NetApps, so there may be a simple solution to this that I've forgotten. Had a couple of disks fail on an old F720, the first one caused a rebuild onto the hot spare, the second one put it into degraded mode. An admin pulled the dead drives (w/o first doing a 'disk remove') and inserted a two of the same size drives from a retired system that used to be its twin (same H/W and firmware), but sysconfig -r still shows the 2 disks as broken. The references I have all seem to be for later releases, as priv set advanced disk unfail <DISK> comes back with disk: Did not recognize option "unfail". Also, I seemed to recall that the replaced drives would show up as foreign volumes since they weren't zeroed on the old system, so perhaps the system just doesn't realize the drives were replaced. I also remember that years ago when I hot-swapped a drive, the 720 seemed to 'freeze' for a minute or so, but the guy who replaced the drive said he didn't notice any pause in the activity lights on the other drives as he was replacing the failed drives. I did remember that it will only run for 24 hours in degraded mode, so I've set raid.timeout to a bigger number to buy me some time. Any suggestions on how to get it to recognize the 'new' disks?
Thanks, Frank
Maybe move to rc_toggle_basic first? Then try disk unfail?
-Blake
On 10/9/06, Frank Smith fsmith@hoovers.com wrote:
It's been a long time since I've dealt with NetApps, so there may be a simple solution to this that I've forgotten. Had a couple of disks fail on an old F720, the first one caused a rebuild onto the hot spare, the second one put it into degraded mode. An admin pulled the dead drives (w/o first doing a 'disk remove') and inserted a two of the same size drives from a retired system that used to be its twin (same H/W and firmware), but sysconfig -r still shows the 2 disks as broken. The references I have all seem to be for later releases, as priv set advanced disk unfail <DISK> comes back with disk: Did not recognize option "unfail". Also, I seemed to recall that the replaced drives would show up as foreign volumes since they weren't zeroed on the old system, so perhaps the system just doesn't realize the drives were replaced. I also remember that years ago when I hot-swapped a drive, the 720 seemed to 'freeze' for a minute or so, but the guy who replaced the drive said he didn't notice any pause in the activity lights on the other drives as he was replacing the failed drives. I did remember that it will only run for 24 hours in degraded mode, so I've set raid.timeout to a bigger number to buy me some time. Any suggestions on how to get it to recognize the 'new' disks?
Thanks, Frank
-- Frank Smith fsmith@hoovers.com Sr. Systems Administrator Voice: 512-374-4673 Hoover's Online Fax: 512-374-4501
Hi all, I do not believe disk unfail was in 5.3.x so try below - make sure you get the disk ID.
filer> rc_toggle_basic filer*> disk_erase_label DISK.number (e.g. 8b.9)
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Blake Golliher Sent: Thursday, 12 October 2006 6:49 AM To: Frank Smith Cc: toasters@mathworks.com Subject: Re: replacing a broken disk in 5.3.6R2
Maybe move to rc_toggle_basic first? Then try disk unfail?
-Blake
On 10/9/06, Frank Smith fsmith@hoovers.com wrote:
It's been a long time since I've dealt with NetApps, so there may be a simple solution to this that I've forgotten. Had a couple of disks fail on an old F720, the first one caused a rebuild onto the hot spare, the second one put it into degraded mode. An admin pulled the dead drives (w/o first doing a 'disk remove') and inserted a two of the same size drives from a retired system that used to be its twin (same H/W and firmware), but sysconfig -r still shows the 2 disks as broken. The references I have all seem to be for later releases, as priv set advanced disk unfail <DISK> comes back with disk: Did not recognize option "unfail". Also, I seemed to recall that the replaced drives would show up as foreign volumes since they weren't zeroed on the old system, so perhaps the system just doesn't realize the drives were replaced. I also remember that years ago when I hot-swapped a drive, the 720 seemed to 'freeze' for a minute or so, but the guy who replaced the drive said he didn't notice any pause in the activity lights on the other drives as he was replacing the failed drives. I did remember that it will only run for 24 hours in degraded mode, so I've set raid.timeout to a bigger number to buy me some time. Any suggestions on how to get it to recognize the 'new' disks?
Thanks, Frank
-- Frank Smith fsmith@hoovers.com Sr. Systems Administrator Voice: 512-374-4673 Hoover's Online Fax: 512-374-4501
It turns out that I have a more serious problem: the filer doesn't seem to notice disks being inserted or removed. Replacing the failed disks or adding disks to empty slots has no affect on the 'sysconfig -d' output, and doesn't cause activity to pause as I remember it did in the past. While a reboot might fix it, it might also not come back up, so my current plan is to just migrate the few remaining clients off of it and just shut it down for good.
Frank
Michael Schipp wrote:
Hi all, I do not believe disk unfail was in 5.3.x so try below - make sure you get the disk ID.
filer> rc_toggle_basic filer*> disk_erase_label DISK.number (e.g. 8b.9)
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Blake Golliher Sent: Thursday, 12 October 2006 6:49 AM To: Frank Smith Cc: toasters@mathworks.com Subject: Re: replacing a broken disk in 5.3.6R2
Maybe move to rc_toggle_basic first? Then try disk unfail?
-Blake
On 10/9/06, Frank Smith fsmith@hoovers.com wrote:
It's been a long time since I've dealt with NetApps, so there may be a simple solution to this that I've forgotten. Had a couple of disks fail on an old F720, the first one caused a rebuild onto the hot spare, the second one put it into degraded mode. An admin pulled the dead drives (w/o first doing a 'disk remove') and inserted a two of the same size drives from a retired system that used to be its twin (same H/W and firmware), but sysconfig -r still shows the 2 disks as broken. The references I have all seem to be for later releases, as priv set advanced disk unfail <DISK> comes back with disk: Did not recognize option "unfail". Also, I seemed to recall that the replaced drives would show up as foreign volumes since they weren't zeroed on the old system, so perhaps the system just doesn't realize the drives were replaced. I also remember that years ago when I hot-swapped a drive, the 720 seemed to 'freeze' for a minute or so, but the guy who replaced the drive said he didn't notice any pause in the activity lights on the other drives as he was replacing the failed drives. I did remember that it will only run for 24 hours in degraded mode, so I've set raid.timeout to a bigger number to buy me some time. Any suggestions on how to get it to recognize the 'new' disks?
Thanks, Frank
-- Frank Smith fsmith@hoovers.com Sr. Systems Administrator Voice: 512-374-4673 Hoover's Online Fax: 512-374-4501