Hi All,

This is my first post, so please be gentle. Not sure if the screen grabs will work or not (resent without screengrabs) and I hope I have removed any IP addresses etc.

 

Basically we upgraded our 8.3.2 CDOT systems to 9.1P1 at the weekend (1 x Fas 2552 & 2 x FAS8040)

We then started getting the below email on every snapmirror update;

-----Original Message-----
From:
Sent: 25 March 2017 14:07
To:
Subject: ukhc3-ntap-b: smc.snapmir.update.fail [LOG_ERR]

 

 

Filer: ukhc3-ntap-b

Time: Sat, Mar 25 14:07:01 2017 +0000

Severity: LOG_ERR

 

Message: smc.snapmir.update.fail: Snapmirror update from source volume 'ntap-cl1-svmcifs:v_ukliss_cifs_sclive_t4r2mv' to destination volume 'ukliss-ntap-cl1-svmcifs_DR:mv_ukliss_cifs_sclive_t4r2' failed with error 'Note=Scheduled update operation is queued.'. Relationship UUID '18916a2e-9dae-11e6-b319-00a098a01c55'.

 

Description: This message occurs when a scheduled SnapMirror(R) transfer fails.

 

Action: Check the snapmirror_error log in the destination filer and take action accordingly.

 

Source: sm_logger_main

Index: 3572349

 

[Note: This email message is sent using a deprecated event routing mechanism.

For information, search the knowledgebase of the NetApp support web site for "convert existing event configurations in Data ONTAP 9.0."]

--------------------------

Yet, when checking the relationships, all appears ok and state is healthy.

 

Snapmirror_error log shows these “delete failed” errors;

 

Sun Mar 26 08:05:01 BST 2017 ScheduledUpdate[Mar 26 08:05:00]:18916a2e-9dae-11e6-b319-00a098a01c55 Operation-Uuid=46f5c6a8-2c4d-4bf5-aa7c-cf85563d0897 Group=none Operation-Cookie=0 sm_rpm_snaplist_delete:6682 Workflow ID [114] Snapshot "snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347364.2017-03-26_070500" delete failed

Sun Mar 26 08:05:01 BST 2017 ScheduledUpdate[Mar 26 08:05:00]:098469ee-0e4d-11e7-b319-00a098a01c55 Operation-Uuid=7e139dd9-25b1-474e-9b5c-6c8ce39a88cf Group=none Operation-Cookie=0 sm_rpm_snaplist_delete:6682 Workflow ID [112] Snapshot "snapmirror.30446395-2d51-11e6-86a9-00a098a01c55_2158347374.2017-03-26_040500" delete failed

Sun Mar 26 08:05:03 BST 2017 ScheduledUpdate[Mar 26 08:05:00]:f2b650fe-a011-11e6-b319-00a098a01c55 Operation-Uuid=bfd58d04-5839-4dd8-9764-5bd5690b219b Group=none Operation-Cookie=0 sm_rpm_snaplist_delete:6682 Workflow ID [113] Snapshot "snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347365.2017-03-26_070500" delete failed

Sun Mar 26 09:05:01 BST 2017 ScheduledUpdate[Mar 26 09:05:00]:18916a2e-9dae-11e6-b319-00a098a01c55 Operation-Uuid=3b53334a-8550-4f32-890c-f456af8ea6d5 Group=none Operation-Cookie=0 sm_rpm_snaplist_delete:6682 Workflow ID [120] Snapshot "snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347364.2017-03-26_080500" delete failed

Sun Mar 26 09:05:03 BST 2017 ScheduledUpdate[Mar 26 09:05:00]:f2b650fe-a011-11e6-b319-00a098a01c55 Operation-Uuid=baa2cbbb-86b1-49e0-96a2-250973362b7b Group=none Operation-Cookie=0 sm_rpm_snaplist_delete:6682 Workflow ID [119] Snapshot "snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347365.2017-03-26_080500" delete failed

 

This is confirmed by doing a snap list;

 

ukhc3::>  snap list mv_ukliss_cifs_scarchive_t4r2

                  snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347365.2017-03-26_080500 3.16MB 0% 0%

                  snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347365.2017-03-26_090500 2.93MB 0% 0%

 

But the audit log confirms they are being deleted an hour later (confirmed time and zone all correct);

 

Sun Mar 26 09:05:00 BST 2017 ScheduledUpdate[Mar 26 09:05:00]:18916a2e-9dae-11e6-b319-00a098a01c55 Operation-Uuid=3b53334a-8550-4f32-890c-f456af8ea6d5 Group=none Operation-Cookie=0 action=Info sm_rpm_snaplist_delete([120]): Snapshot snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347364.2017-03-26_070500 deleted.

Sun Mar 26 09:05:00 BST 2017 ScheduledUpdate[Mar 26 09:05:00]:18916a2e-9dae-11e6-b319-00a098a01c55 Operation-Uuid=3b53334a-8550-4f32-890c-f456af8ea6d5 Group=none Operation-Cookie=0 action=Info  snapshot(s) up to snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347364.2017-03-26_090500 will be selected for transfer.

Sun Mar 26 09:05:00 BST 2017 ScheduledUpdate[Mar 26 09:05:00]:18916a2e-9dae-11e6-b319-00a098a01c55 Operation-Uuid=3b53334a-8550-4f32-890c-f456af8ea6d5 Group=none Operation-Cookie=0 action=Transfer_snapshots (1 of 1 log entries): snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347364.2017-03-26_090500

Sun Mar 26 09:05:00 BST 2017 ScheduledUpdate[Mar 26 09:05:00]:f2b650fe-a011-11e6-b319-00a098a01c55 Operation-Uuid=baa2cbbb-86b1-49e0-96a2-250973362b7b Group=none Operation-Cookie=0 action=Info sm_rpm_snaplist_delete([119]): Snapshot snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347365.2017-03-26_070500 deleted.

 

 

Relationship is also healthy in oCUM (screengrab removed)

 

 

However, this line is present in the snapmirror audit log;

Sun Mar 26 07:05:00 BST 2017 CreateSnapshotDest[Mar 26 07:05:00]:18916a2e-9dae-11e6-b319-00a098a01c55 Operation-Uuid=93851169-93b6-4067-abfb-9948c09920c8 Group=none Operation-Cookie=0 action=End source=ntap-cl1-svmcifs:v_ukliss_cifs_sclive_t4r2mv destination=ukliss-ntap-cl1-svmcifs_DR:mv_ukliss_cifs_sclive_t4r2 status=Success Note=Scheduled update operation is queued.

 

Success but queued?

 

 

Just to see if it was an issue with the upgrade I have created multiple test relationships;

Local Aggr1 to aggr2

In reverse

Between the 2552 & the 8040’s

Between the 8040’s

 

but getting the same issue – (received same email as above)

 

Indeed at this point I have come to the conclusion that we have hit the bug: 1042534

http://mysupport.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=1042534

 

 

Has anyone else experienced this? Or can offer up some useful advice as to our next steps.

 

Currently this system is only partially in production as a mv target, so the email spam is not too bad, but we plan to add many relationships very soon.

 

Kind Regards,

Chris.