Well I recognise it! It happened 9 times out of 10 when I was using NDMPCopy a month or so back. The source filer will eventually give up, but only after an hour or two...
Because there are 5 sessions for NDMP on a given filer, and I was copying qtrees between filers, I had to spend lots'o time manually setting up and tearing down NDMPCopies - because I couldn't rely on a script keeping the copies going as fast as possible, and I was constrained by time.
I also had anomalously low throughput on one pair of filers which remained constant no matter how many concurrent NDMPCopies I set running, and another pair would crash if I tried to set up concurrent NDMPCopies, though the throughput was sufficiently high (ten times the speed of the stable-but-slow pair) that that didn't impact the overall operation.
In case you're wondering the first pair was pretty much identical to the second pair, give or take a shelf of disks or two. We've not yet figured out why the difference in stability or performance. The source filers were F540s, the targets were F630s, all on one FDDI ring with two E4000's running 2.5.1 fully and equivalently patched to the gills running the NDMPCopies.
Nevertheless, I used to hate setting up back-to-back dump restores and NDMPCopy is gonna be a permanent fixture in our toolbox (until something even better happens 8)
On Oct 30, 6:43, Brian Tao wrote:
Subject: Source NDMP server hangs after successful dump? I downloaded ndmpcopy 1.1 from ftp.ndmp.org and used it to duplicate the data across filers. In every case using four different F230's, the dump/restore was successful, but the ndmpcopy never exits, suggesting the source filer may be hung. All the data was transferred, and the source filer is still alive and serving NFS, except it doesn't know that the dump is finished. All the filers are running 4.2a, as shipped. Is this a known problem?
# ./ndmpcopy -v adm1-na1:/home/home1 adm1-na3:/home/home3 -sa root:********
-da root:********
Connecting to adm1-na1. Connecting to adm1-na3. adm1-na1: CONNECT: Connection established. adm1-na3: CONNECT: Connection established. adm1-na1: LOG: DUMP: creating "snapshot_for_dump.1" snapshot. adm1-na1: LOG: DUMP: Date of this level 0 dump: Wed Oct 29 12:11:30 1997 adm1-na1: LOG: DUMP: Date of last level 0 dump: the epoch adm1-na1: LOG: DUMP: Dumping /home/home1/ to NDMP connection adm1-na1: LOG: DUMP: mapping (Pass I) [regular files] adm1-na1: LOG: DUMP: mapping (Pass II) [directories] adm1-na1: LOG: DUMP: estimated 3373500 tape blocks. adm1-na1: LOG: DUMP: dumping (Pass III) [directories] adm1-na1: LOG: DUMP: dumping (Pass IV) [regular files] adm1-na1: LOG: DUMP: 15% done, finished in 0:27 adm1-na1: LOG: DUMP: 32% done, finished in 0:20 adm1-na1: LOG: DUMP: 49% done, finished in 0:15 adm1-na1: LOG: DUMP: 66% done, finished in 0:10 adm1-na1: LOG: DUMP: 82% done, finished in 0:05 adm1-na1: LOG: DUMP: 99% done, finished in 0:00 adm1-na3: HALT: The operation was successful! Waiting for adm1-na1 to halt too. (If it sits here forever, the transfer was successful, but the source filer has hung. Press ^C.) Elapsed time: 0 hours, 32 minutes, 10 seconds.
^C 0.04u 0.00s 37:02.48 0.0%
-- Brian Tao (BT300, taob@netcom.ca) "Though this be madness, yet there is method in't" -- End of excerpt from Brian Tao