> > Looks Like it finished rebuilding the disk to me ;)
>
> I think I've misused the term "degraded mode", but my question is
> still the same. What exactly is it going to do if the scrub fails?
> I'd rather the filer wasn't stressed until a new spare is in there.
I understand your question now.
If RAID scrub finds a block it can't read, it reads the corresponding
blocks on all the other drives and reconstructs the missing data. Then
it re-writes the unreadable block, which remaps the block to a new
location. I believe the remapping occurs automatically, because we set
an auto-remap flag on the drives, but this may vary from drive to
drive.
This doesn't require a spare disk.
> I guess it still could help with localized media errors, but I've
> never seen any reaction to media errors other than a whole disk
> failing out.
If a WRITE fails, then we do drop the whole disk out. This is because,
with auto-reremap, a WRITE should never, ever fail. If it does,
there's something seriously wrong. A read failure during normal
operation is handled as described above with RAID scrubbing. The block
is rebuilt and rewritten so that the drive remaps it.
We added RAID scrub because single block failures do occur, and they
are nasty.
If you've got an undetected bad block, and then a different disk fails
entirely, then the stripe with the bad block will be impossible to
reconstruct, because it has two missing blocks. In this case we zero
both blocks and require the user to run WACK to fix whatever damage
that causes. That generally works, unless the lost block is really
important, but it's much, much better to find and fix the bad block
right away, which is what the scrubbing does.
It actually took a year or two before we started seeing this problem in
our install base. We figure that it just takes a while before a system
sits long enough to have files that never get read, and then for a
failure to occur in such a file.
Dave