Thanks once again to all the incredibly helpful folks... Here's my progress so far...
I edited the disk label of the drive whose controller I replaced. It looks just like the others now.
So I booted the system (readonly mode, thanks for that tip! :-), and the volume is actually online, and viewable, which is a major step forward.
But I'm getting a lot of errors about inconsistent directory entries:
Mon Jan 7 09:36:45 AST [/e3]: wafl_nfs_lookup: inconsistent directory entry {x20 0 8166461 156921645 16794490} <th.100x100.26.jpg> in {x20 0 15140401 133992804 16794490}.
I have a couple of potential courses of action right now:
- Copy all the data off that I can, regardless; this is obviously a good thing to do in any case. Unfortunately, the recursive copies I attempt seem to hang after a short while. Darn...
- Take my drive-with-new-controller out of the set, and let the netapp attempt a rebuild onto a new drive, ignoring media errors on the second bad drive. I worry that this will create further corruption; but given the fact my current attempt has corruption, it might not be worse...? (And I might not be able to get back to where I currently am.) A cleaner (and safer) alternative might be to boot ignoring media errors, in read only mode, with my controller-replaced-drive out of the set. In read only mode, it shouldn't rebuild the set, and ignoring media errors, it might be able to access the data in degraded mode (or at least let me view what is available with that method...)
- Let the netapp repair the inconsistencies; I'm not sure the best way to proceed on this one?
I'm working on the tape-restore method in parallel, but anything we can get off via some creative tweaking, would be worth the try...
I think I'll try the ignore-media-errors and read-only-mode thing next, to see what that view of the world is like.
-dale
A bit more information, and yet another question :-)
I took out the drive whose controller I replaced; then I booted in readonly mode, with medium errors disabled. The volume came up in degraded mode, and the data actually looks pretty good so far. I'm copying it off at a pretty good rate, with no hangs or anything. Keeping my fingers crossed.
One question: in the "ignore medium errors", if a medium error *is* encountered, is there at least a message printed, or is it silently ignored? Just wondering if my seemingly good data might be corrupt. (The nature of the content will allow me to largely verify it's integrity after it's copied, which is good.)
I'm assuming there would be some warning printed, at least. The fact that the directory structure shows no corruption so far, I think it also a good sign.
-dale
Dale Gass wrote:
Thanks once again to all the incredibly helpful folks... Here's my progress so far...
I edited the disk label of the drive whose controller I replaced. It looks just like the others now.
So I booted the system (readonly mode, thanks for that tip! :-), and the volume is actually online, and viewable, which is a major step forward.
But I'm getting a lot of errors about inconsistent directory entries:
Mon Jan 7 09:36:45 AST [/e3]: wafl_nfs_lookup: inconsistent directory entry {x20 0 8166461 156921645 16794490} <th.100x100.26.jpg> in {x20 0 15140401 133992804 16794490}.
I have a couple of potential courses of action right now:
- Copy all the data off that I can, regardless; this is obviously a
good thing to do in any case. Unfortunately, the recursive copies I attempt seem to hang after a short while. Darn...
- Take my drive-with-new-controller out of the set, and let the netapp
attempt a rebuild onto a new drive, ignoring media errors on the second bad drive. I worry that this will create further corruption; but given the fact my current attempt has corruption, it might not be worse...? (And I might not be able to get back to where I currently am.) A cleaner (and safer) alternative might be to boot ignoring media errors, in read only mode, with my controller-replaced-drive out of the set. In read only mode, it shouldn't rebuild the set, and ignoring media errors, it might be able to access the data in degraded mode (or at least let me view what is available with that method...)
- Let the netapp repair the inconsistencies; I'm not sure the best way
to proceed on this one?
I'm working on the tape-restore method in parallel, but anything we can get off via some creative tweaking, would be worth the try...
I think I'll try the ignore-media-errors and read-only-mode thing next, to see what that view of the world is like.
-dale