Hi:
I'm hoping someone has seen this message. I've been working with netapp tech support but they, so far, have not been able to duplicate the error in their lab.
I'll start by apologizing for the length of the email. Brevity can result in key details being left out; details that might otherwise explain the situation and lead to the solution. On the other hand, too many details can annoy the crap out of people.
We have two netapps, both model FAS270. Netapp1, the source for ndmpcopy, runs 7.0.4; netapp2, the destination, netapp2, runs 7.2.7. We use ndmpcopy to copy volumes from netapp1 to netapp2. Intermittently (though it seems to occur more frequently now), the ndmpcopy fails to run and gives the following message:
Body error NDMP_ILLEGAL_STATE_ERR in reply message NdmpMessageDataStartRecover from destination Feb 25 06:00:05 CST [ndmpc:233]: Failed to start restore on destination
There is no pattern regarding on which volume or the size of the volume the ndmpcopy operation will fail. It can occur on one of our smaller volumes (1g) or on our largest (765g). The error will occur when the ndmpcopy operation is initiated from the console or when it is started from a perl script on the administrative host for the destination filer. Though the two netapps are physically located in separate buildings that are miles apart, they are on the same subnet and no other network issues regarding them have occurred.
There is only one instance of the error message in the knowledgebase and it concerns a SAN library with BakBone NetVault with NDMP plugin.
A review of the ndmpcopy spec at ndmp.org suggests one of the following:
1) Illegal state. A data operation is not currently in progress. 2) Tape record size cannot be set in the current state. 3) A data operation is already in progress. Only one data operation is allowed to be executing at a time. 4) Illegal state. NDMP server not in the halted state 5) Message cannot be processed in current state
I checked the state of the ndmpd server on the destination, netapp2, and there are no other sessions. All the others are things I have no control over.
Some of the things I've tried: o Toggled the ndmpd daemon off and on for both source and destination o Set the ndmpd debug level to 70 o Set the debug option on the ndmpcopy command o Set the ndmpd level to three (had been four)
Regarding the detailed ndmpdlog, there is no explanation as to what causes the error (only the error):
Feb 25 06:00:05 CST [ndmpd:73]: Restore type: dump Feb 25 06:00:05 CST [ndmpd:73]: Error code: NDMP_ILLEGAL_STATE_ERR <--- Feb 25 06:00:05 CST [ndmpd:73]: Key: EXTRACT, Value: N Feb 25 06:00:05 CST [ndmpd:73]: Key: EXCLUDE_PATH, Value: /etc Feb 25 06:00:05 CST [ndmpd:73]: Key: FILESYSTEM, Value: /vol/vol0 Feb 25 06:00:05 CST [ndmpd:73]: Key: BASE_DATE, Value: 9856860996 Feb 25 06:00:05 CST [ndmpd:73]: Key: UPDATE, Value: Y Feb 25 06:00:05 CST [ndmpd:73]: Key: LEVEL, Value: 2 <--- Feb 25 06:00:05 CST [ndmpd:73]: Key: DMP_NAME, Value: ndmpcopy:/vol/vol0//vol/vol0/128.248.155.101128.248.155.102
As you can see, this failure occurred on a level two operation; it has occurred on the other two levels as well.
I believe, though I can say for certain, that the failure rate increased after upgrading ONTAP on the destination from 7.2.2 to 7.2.7.
Please do not suggest we license snapmirror. We are a university in Illinois, a state whose financial situation is only slightly better than that of California. If I can't solve the problem, then *** yes *** I'll try vol copy.
Michael Homa Operating Systems Support and Database Group Academic Computing and Communication Center University of Illinois at Chicago email: mhoma@uic.edu
Michael,
We have two netapps, both model FAS270. Netapp1, the source for ndmpcopy, runs 7.0.4; netapp2, the destination, netapp2, runs 7.2.7. We use ndmpcopy to copy volumes from netapp1 to netapp2. Intermittently (though it seems to occur more frequently now), the ndmpcopy fails to run and gives the following message:
Body error NDMP_ILLEGAL_STATE_ERR in reply message NdmpMessageDataStartRecover from destination Feb 25 06:00:05 CST [ndmpc:233]: Failed to start restore on destination
There is no pattern regarding on which volume or the size of the volume the ndmpcopy operation will fail. It can occur on one of our smaller volumes (1g) or on our largest (765g). The error will occur when the
Intermittent failures. Our favorite kind!
Are you able to reproduce the failure on each filer separately, with the source/destination being the same (netapp1<->netapp1, netapp2<->netapp2)? How about in the opposite direction?
What sort of authentication are you using? Looks like you're digging through /etc/log/ndmpdlog. Is there anything telling in /etc/log/ndmpcopy.<date>?
-Kevin
*------------------------------------------*-----------------------* | Kevin Davis (UNIX/Storage Sysadmin) | Natick, Massachusetts | | 508.647.7660 | 01760-2098 | | mailto:kevin.davis@mathworks.com *-----------------------* | http://www.mathworks.com | | *------------------------------------------*-----------------------*