Hi All,
This is my first post, so please be gentle. Not sure if the screen grabs will work or not and I hope I have removed any IP addresses etc.
Basically we upgraded our 8.3.2 CDOT systems to 9.1P1 at the weekend (1 x Fas 2552 & 2 x FAS8040)
We then started getting the below email on every snapmirror update;
-----Original Message-----
From:
Sent: 25 March 2017 14:07
To:
Subject: ukhc3-ntap-b: smc.snapmir.update.fail [LOG_ERR]
Filer: ukhc3-ntap-b
Time: Sat, Mar 25 14:07:01 2017 +0000
Severity: LOG_ERR
Message: smc.snapmir.update.fail: Snapmirror update from source volume 'ntap-cl1-svmcifs:v_ukliss_cifs_sclive_t4r2mv' to destination volume 'ukliss-ntap-cl1-svmcifs_DR:mv_ukliss_cifs_sclive_t4r2' failed with error 'Note=Scheduled update
operation is queued.'. Relationship UUID '18916a2e-9dae-11e6-b319-00a098a01c55'.
Description: This message occurs when a scheduled SnapMirror(R) transfer fails.
Action: Check the snapmirror_error log in the destination filer and take action accordingly.
Source: sm_logger_main
Index: 3572349
[Note: This email message is sent using a deprecated event routing mechanism.
For information, search the knowledgebase of the NetApp support web site for "convert existing event configurations in Data ONTAP 9.0."]
--------------------------
Yet, when checking the relationships, all appears ok and state is healthy.
Snapmirror_error log shows these “delete failed” errors;
Sun Mar 26 08:05:01 BST 2017 ScheduledUpdate[Mar 26 08:05:00]:18916a2e-9dae-11e6-b319-00a098a01c55 Operation-Uuid=46f5c6a8-2c4d-4bf5-aa7c-cf85563d0897 Group=none Operation-Cookie=0 sm_rpm_snaplist_delete:6682
Workflow ID [114] Snapshot "snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347364.2017-03-26_070500"
delete failed
Sun Mar 26 08:05:01 BST 2017 ScheduledUpdate[Mar 26 08:05:00]:098469ee-0e4d-11e7-b319-00a098a01c55 Operation-Uuid=7e139dd9-25b1-474e-9b5c-6c8ce39a88cf Group=none Operation-Cookie=0 sm_rpm_snaplist_delete:6682
Workflow ID [112] Snapshot "snapmirror.30446395-2d51-11e6-86a9-00a098a01c55_2158347374.2017-03-26_040500"
delete failed
Sun Mar 26 08:05:03 BST 2017 ScheduledUpdate[Mar 26 08:05:00]:f2b650fe-a011-11e6-b319-00a098a01c55 Operation-Uuid=bfd58d04-5839-4dd8-9764-5bd5690b219b Group=none Operation-Cookie=0 sm_rpm_snaplist_delete:6682
Workflow ID [113] Snapshot "snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347365.2017-03-26_070500"
delete failed
Sun Mar 26 09:05:01 BST 2017 ScheduledUpdate[Mar 26 09:05:00]:18916a2e-9dae-11e6-b319-00a098a01c55 Operation-Uuid=3b53334a-8550-4f32-890c-f456af8ea6d5 Group=none Operation-Cookie=0 sm_rpm_snaplist_delete:6682
Workflow ID [120] Snapshot "snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347364.2017-03-26_080500"
delete failed
Sun Mar 26 09:05:03 BST 2017 ScheduledUpdate[Mar 26 09:05:00]:f2b650fe-a011-11e6-b319-00a098a01c55 Operation-Uuid=baa2cbbb-86b1-49e0-96a2-250973362b7b Group=none Operation-Cookie=0 sm_rpm_snaplist_delete:6682
Workflow ID [119] Snapshot "snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347365.2017-03-26_080500"
delete failed
This is confirmed by doing a snap list;
ukhc3::> snap list mv_ukliss_cifs_scarchive_t4r2
snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347365.2017-03-26_080500
3.16MB 0% 0%
snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347365.2017-03-26_090500 2.93MB 0% 0%
But the audit log confirms they are being deleted an hour later (confirmed time and zone all correct);
Sun Mar 26 09:05:00 BST 2017 ScheduledUpdate[Mar 26 09:05:00]:18916a2e-9dae-11e6-b319-00a098a01c55 Operation-Uuid=3b53334a-8550-4f32-890c-f456af8ea6d5 Group=none Operation-Cookie=0 action=Info
sm_rpm_snaplist_delete([120]): Snapshot snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347364.2017-03-26_070500
deleted.
Sun Mar 26 09:05:00 BST 2017 ScheduledUpdate[Mar 26 09:05:00]:18916a2e-9dae-11e6-b319-00a098a01c55 Operation-Uuid=3b53334a-8550-4f32-890c-f456af8ea6d5 Group=none Operation-Cookie=0 action=Info
snapshot(s) up to snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347364.2017-03-26_090500 will be selected for transfer.
Sun Mar 26 09:05:00 BST 2017 ScheduledUpdate[Mar 26 09:05:00]:18916a2e-9dae-11e6-b319-00a098a01c55 Operation-Uuid=3b53334a-8550-4f32-890c-f456af8ea6d5 Group=none Operation-Cookie=0 action=Transfer_snapshots
(1 of 1 log entries): snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347364.2017-03-26_090500
Sun Mar 26 09:05:00 BST 2017 ScheduledUpdate[Mar 26 09:05:00]:f2b650fe-a011-11e6-b319-00a098a01c55 Operation-Uuid=baa2cbbb-86b1-49e0-96a2-250973362b7b Group=none Operation-Cookie=0 action=Info
sm_rpm_snaplist_delete([119]): Snapshot snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347365.2017-03-26_070500
deleted.
Relationship is also healthy in oCUM
However, this line is present in the snapmirror audit log;
Sun Mar 26 07:05:00 BST 2017 CreateSnapshotDest[Mar 26 07:05:00]:18916a2e-9dae-11e6-b319-00a098a01c55 Operation-Uuid=93851169-93b6-4067-abfb-9948c09920c8 Group=none
Operation-Cookie=0 action=End source=ntap-cl1-svmcifs:v_ukliss_cifs_sclive_t4r2mv destination=ukliss-ntap-cl1-svmcifs_DR:mv_ukliss_cifs_sclive_t4r2
status=Success Note=Scheduled update operation is queued.
Success but queued?
Just to see if it was an issue with the upgrade I have created multiple
test relationships;
Local Aggr1 to aggr2
In reverse
Between the 2552 & the 8040’s
Between the 8040’s
but getting the same issue;
Indeed at this point I have come to the same conclusion that we have hit the bug:
1042534
http://mysupport.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=1042534
Has anyone else experienced this? Or can offer up some useful advice as to our next steps.
Currently this system is only partially in production as a mv target, so the email spam is not too bad, but we plan to add many relationships very soon.
Kind Regards,
Chris.