Thanks once again to all the incredibly helpful folks... Here's my progress so far...
I edited the disk label of the drive whose controller I replaced. It looks just like the others now.
So I booted the system (readonly mode, thanks for that tip! :-), and the volume is actually online, and viewable, which is a major step forward.
But I'm getting a lot of errors about inconsistent directory entries:
Mon Jan 7 09:36:45 AST [/e3]: wafl_nfs_lookup: inconsistent directory entry {x20 0 8166461 156921645 16794490} <th.100x100.26.jpg> in {x20 0 15140401 133992804 16794490}.
I have a couple of potential courses of action right now:
- Copy all the data off that I can, regardless; this is obviously a good thing to do in any case. Unfortunately, the recursive copies I attempt seem to hang after a short while. Darn...
- Take my drive-with-new-controller out of the set, and let the netapp attempt a rebuild onto a new drive, ignoring media errors on the second bad drive. I worry that this will create further corruption; but given the fact my current attempt has corruption, it might not be worse...? (And I might not be able to get back to where I currently am.) A cleaner (and safer) alternative might be to boot ignoring media errors, in read only mode, with my controller-replaced-drive out of the set. In read only mode, it shouldn't rebuild the set, and ignoring media errors, it might be able to access the data in degraded mode (or at least let me view what is available with that method...)
- Let the netapp repair the inconsistencies; I'm not sure the best way to proceed on this one?
I'm working on the tape-restore method in parallel, but anything we can get off via some creative tweaking, would be worth the try...
I think I'll try the ignore-media-errors and read-only-mode thing next, to see what that view of the world is like.
-dale