Has anyone seen this one?
I have been having problems copying a volume to another volume on the same filer.
I did a level 0 copy with ndmpcopy and that appeared to work OK. At the end the restore had these errors right before it exited:
home1.Virginia.EDU: DUMP: HALT: The operation was successful! Waiting for home1.Virginia.EDU RESTORE to halt too. home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: bad entry: incomplete operations home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: name: ./h1/e/en/enh7f/RSTTMP05989666 home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: parent name ./h1/e/en/enh7f home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: sibling name: ./h1/e/en/enh7f/Grigory Pechorin as Superflous Psychopath.doc home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: entry type: LEAF home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: inode number: 5989666 home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: flags: TMPNAME home1.Virginia.EDU: RESTORE: HALT: The operation was successful! The transfer is complete. Elapsed time: 12 hours, 13 minutes, 24 seconds.
I'm not conviced that the restore was indeed 100% successful.
This filer is production, so I can't do the whole copy during a downtime. A few days after the level0 I tried to do a level1 ndmpcopy to pick up any changes, and the restore failed with these errors:
home1.Virginia.EDU: LOG: DUMP: home1.Virginia.EDU: LOG: dumping (Pass IV) [regular files] home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: Warning: cannot remove file banner.htm: No such file or directory home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: bad entry: not marked REMOVED home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: name: ./lowinum/banner.htm home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: parent name ./lowinum home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: entry type: LEAF home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: inode number: 9024 home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: flags: REMOVED|KEEP home1.Virginia.EDU: RESTORE: HALT: The operation was successful! Waiting for home1.Virginia.EDU DUMP to halt too. (If it sits here forever, the transfer was successful, but the source filer has hung. Press ^C.) home1.Virginia.EDU: LOG: DUMP: home1.Virginia.EDU: LOG: Error writing to standard output: Broken pipe home1.Virginia.EDU: LOG: DUMP: home1.Virginia.EDU: LOG: DUMP IS ABORTED home1.Virginia.EDU: Connection halted: HALT: Internal error! Elapsed time: 0 hours, 21 minutes, 14 seconds.
Maybe it didn't like the restore_symboltable file or perhaps it didn't like the dump. After a few tries, with different snapshots, I gave up. So I decided I would have to use rsh dump | rsh restore and use the "x" option on restore rather than "r", which ndmpcopy uses. This would simply restore the level 1 dump without trying to remove files deleted on the live filesystem between the level0 and level1. I didn't trust the restore_symboltable file, so I moved it aside, even though "x" isn't supposed to use it.
Then I ran this command:
rsh filer -l root -n 'dump 1uf - /vol/vol0/.snapshot/level1' | \ rsh filer -l root 'restore xfD - /vol/vol1'
This seemed to be working fine and the dump finished according to the output from rsh, but the restore is still running over an hour later and it's gobbling up 100% of the filer CPU. Furthermore, it seems to be ZEROING OUT ALL THE FILES THAT IT RESTORED!
While the dump was running, I checked my own home directory on the new volume because I had changed some files after the level0. Sure enough, some of the new files were there and intact, so restore had restored them. Now that the dump is done, I check those same files and they are all 0 length! What's going on?
This volume, has 120G used, but the level1 dump was only 3480944 K according to dump (a bit over 3G). The new volume is twice as big as the old one.
The "rsh dump" process has exited, but the "rsh restore" process is still there and the filer CPU is at 100%.
I just killed the "rsh restore" and got this error:
RESTORE: Interrupted
Now the filer CPU is down to 10% or so.
We are running DOT 5.3.5R2P2.
Steve Losen scl@virginia.edu phone: 804-924-0640
University of Virginia ITC Unix Support