Re: parity drive rebuild causing ls hangs

17 Oct 2004


      Hello Jerry
...
I too am astonished, but it was definitely it. The
split second it was done rebuilding the problem went
away.  I was sitting there recalling my command
history and my co-worker was running various ls
commands.
Things that come into my head:
-Unfixed bug *79418: *When the option raid.reconstruct.perf_impact is 
low, the FilerView RAID Reconstruct Speed is high; and when the option 
raid.reconstruct.perf_impact is high, the FilerView RAID Reconstruct 
Speed is low.
-Reconstructing a parity should be much faster than a data disk 
reconstruct, because a parallel reading of the user data doesn't require 
an out-of-band reconstruct. Only the reconstruct is done sequentially.
-Maybe you faced an other additional reconstructable diskfailure which 
forced a higher cpu-consuming wafl-ironing/filesystem-checking?
-May you had physical problems on the disk or fc-layer?  command: fcadmin
-I will try do reconstruct your problem next week.
...
I did not run any diagnostic commands such as statit. 
By the time we tracked it down to being the netapp in
the first place, we were busy failing over apps to
another site and scurrying around.  I think we are
using 6.4.1 (not connected to work right now).  Other
volumes were not effected.  Have you ever yanked a
parity disk?
We yank everything. :-)  We have 16 training filers ( 8 per class ) and 
I give appr. two classes/month.
We usually kill all kinds of disks on all filers. Some kill single disks 
(data or parity), some force multiple disk errors
by using disk fail or pulling them out physically. So we have an average 
of two parity failures per week. And yes, we use "hammer", "sio" and 
other load generation tools. ;-)
So I can tell you, that pulling out the two mailbox-disks of the 
root-volume at the same time will panic your filer even if you have raid-dp.
The filer needs a delay of 10 seconds to activate a replacement disk and 
stamp it as a mailboxdisk before the second mbx disk is allowed to fail.
If not, the filesystem still can be reconstructed, but the 
clustermailbox information is lost.
=> Spread your root volume over multiple FCs and use SyncMirror to have 
a 4 mailbox disks redundancy for 99,99...% highavailability.
Best regards
Dirk

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: parity drive rebuild causing ls hangs