toasters April 2017

toasters@lists.teaparty.net

35 participants
22 discussions

RAID Reconstruction after ONTAP upgrade?
by Alexander Griesser 03 Apr '17

03 Apr '17

Hi, has anyone ever seen a RAID reconstruct happening immediately after an OnTap upgrade? I just upgraded one of my older filers to 8.3.1P2 and it is now running a reconstruction on one of its aggregates for (at least to me) no obvious reason. During the boot up of this controller after the upgrade, I saw the following message on the console which did not show up on the second controller: Creating trace file /etc/log/rastrace/RAID_0_20170402_17:18:00:095890.dmp No disks show as broken, or in maintenance mode, or anything like that - so any hints would be welcome. Here's the output of `aggr status -r` on this controller: RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) --------- ------ ------------- ---- ---- ---- ----- -------------- -------------- dparity 0b.22.23 0b 22 23 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 parity 0b.22.4 0b 22 4 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 2a.21.5 2a 21 5 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0b.22.5 0b 22 5 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 2a.21.6 2a 21 6 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0b.22.6 0b 22 6 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 2a.21.7 2a 21 7 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0b.22.7 0b 22 7 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 (reconstruction 3% completed) data 2a.21.8 2a 21 8 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0b.22.8 0b 22 8 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 2a.21.9 2a 21 9 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0b.22.9 0b 22 9 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 2a.21.10 2a 21 10 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0b.22.10 0b 22 10 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 2a.21.11 2a 21 11 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0b.22.11 0b 22 11 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 2a.21.12 2a 21 12 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0b.22.12 0b 22 12 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 2a.21.13 2a 21 13 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816 Thanks, Alexander Griesser Head of Systems Operations ANEXIA Internetdienstleistungs GmbH E-Mail: AGriesser(a)anexia-it.com<mailto:AGriesser@anexia-it.com> Web: http://www.anexia-it.com<http://www.anexia-it.com/> Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt Geschäftsführer: Alexander Windbichler Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601

2 5

Upgraded to 9.1P1 now receiving snapmirror failed emails
by Chris Hague 01 Apr '17

01 Apr '17

Hi All, This is my first post, so please be gentle. Not sure if the screen grabs will work or not and I hope I have removed any IP addresses etc. Basically we upgraded our 8.3.2 CDOT systems to 9.1P1 at the weekend (1 x Fas 2552 & 2 x FAS8040) We then started getting the below email on every snapmirror update; -----Original Message----- From: Sent: 25 March 2017 14:07 To: Subject: ukhc3-ntap-b: smc.snapmir.update.fail [LOG_ERR] Filer: ukhc3-ntap-b Time: Sat, Mar 25 14:07:01 2017 +0000 Severity: LOG_ERR Message: smc.snapmir.update.fail: Snapmirror update from source volume 'ntap-cl1-svmcifs:v_ukliss_cifs_sclive_t4r2mv' to destination volume 'ukliss-ntap-cl1-svmcifs_DR:mv_ukliss_cifs_sclive_t4r2' failed with error 'Note=Scheduled update operation is queued.'. Relationship UUID '18916a2e-9dae-11e6-b319-00a098a01c55'. Description: This message occurs when a scheduled SnapMirror(R) transfer fails. Action: Check the snapmirror_error log in the destination filer and take action accordingly. Source: sm_logger_main Index: 3572349 [Note: This email message is sent using a deprecated event routing mechanism. For information, search the knowledgebase of the NetApp support web site for "convert existing event configurations in Data ONTAP 9.0."] -------------------------- Yet, when checking the relationships, all appears ok and state is healthy. Snapmirror_error log shows these "delete failed" errors; Sun Mar 26 08:05:01 BST 2017 ScheduledUpdate[Mar 26 08:05:00]:18916a2e-9dae-11e6-b319-00a098a01c55 Operation-Uuid=46f5c6a8-2c4d-4bf5-aa7c-cf85563d0897 Group=none Operation-Cookie=0 sm_rpm_snaplist_delete:6682 Workflow ID [114] Snapshot "snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347364.2017-03-26_070500" delete failed Sun Mar 26 08:05:01 BST 2017 ScheduledUpdate[Mar 26 08:05:00]:098469ee-0e4d-11e7-b319-00a098a01c55 Operation-Uuid=7e139dd9-25b1-474e-9b5c-6c8ce39a88cf Group=none Operation-Cookie=0 sm_rpm_snaplist_delete:6682 Workflow ID [112] Snapshot "snapmirror.30446395-2d51-11e6-86a9-00a098a01c55_2158347374.2017-03-26_040500" delete failed Sun Mar 26 08:05:03 BST 2017 ScheduledUpdate[Mar 26 08:05:00]:f2b650fe-a011-11e6-b319-00a098a01c55 Operation-Uuid=bfd58d04-5839-4dd8-9764-5bd5690b219b Group=none Operation-Cookie=0 sm_rpm_snaplist_delete:6682 Workflow ID [113] Snapshot "snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347365.2017-03-26_070500" delete failed Sun Mar 26 09:05:01 BST 2017 ScheduledUpdate[Mar 26 09:05:00]:18916a2e-9dae-11e6-b319-00a098a01c55 Operation-Uuid=3b53334a-8550-4f32-890c-f456af8ea6d5 Group=none Operation-Cookie=0 sm_rpm_snaplist_delete:6682 Workflow ID [120] Snapshot "snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347364.2017-03-26_080500" delete failed Sun Mar 26 09:05:03 BST 2017 ScheduledUpdate[Mar 26 09:05:00]:f2b650fe-a011-11e6-b319-00a098a01c55 Operation-Uuid=baa2cbbb-86b1-49e0-96a2-250973362b7b Group=none Operation-Cookie=0 sm_rpm_snaplist_delete:6682 Workflow ID [119] Snapshot "snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347365.2017-03-26_080500" delete failed This is confirmed by doing a snap list; ukhc3::> snap list mv_ukliss_cifs_scarchive_t4r2 snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347365.2017-03-26_080500 3.16MB 0% 0% snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347365.2017-03-26_090500 2.93MB 0% 0% But the audit log confirms they are being deleted an hour later (confirmed time and zone all correct); Sun Mar 26 09:05:00 BST 2017 ScheduledUpdate[Mar 26 09:05:00]:18916a2e-9dae-11e6-b319-00a098a01c55 Operation-Uuid=3b53334a-8550-4f32-890c-f456af8ea6d5 Group=none Operation-Cookie=0 action=Info sm_rpm_snaplist_delete([120]): Snapshot snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347364.2017-03-26_070500 deleted. Sun Mar 26 09:05:00 BST 2017 ScheduledUpdate[Mar 26 09:05:00]:18916a2e-9dae-11e6-b319-00a098a01c55 Operation-Uuid=3b53334a-8550-4f32-890c-f456af8ea6d5 Group=none Operation-Cookie=0 action=Info snapshot(s) up to snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347364.2017-03-26_090500 will be selected for transfer. Sun Mar 26 09:05:00 BST 2017 ScheduledUpdate[Mar 26 09:05:00]:18916a2e-9dae-11e6-b319-00a098a01c55 Operation-Uuid=3b53334a-8550-4f32-890c-f456af8ea6d5 Group=none Operation-Cookie=0 action=Transfer_snapshots (1 of 1 log entries): snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347364.2017-03-26_090500 Sun Mar 26 09:05:00 BST 2017 ScheduledUpdate[Mar 26 09:05:00]:f2b650fe-a011-11e6-b319-00a098a01c55 Operation-Uuid=baa2cbbb-86b1-49e0-96a2-250973362b7b Group=none Operation-Cookie=0 action=Info sm_rpm_snaplist_delete([119]): Snapshot snapmirror.c677ecdf-3df5-11e6-86a9-00a098a01c55_2158347365.2017-03-26_070500 deleted. Relationship is also healthy in oCUM [cid:image001.png@01D2A616.5C0DB820] However, this line is present in the snapmirror audit log; Sun Mar 26 07:05:00 BST 2017 CreateSnapshotDest[Mar 26 07:05:00]:18916a2e-9dae-11e6-b319-00a098a01c55 Operation-Uuid=93851169-93b6-4067-abfb-9948c09920c8 Group=none Operation-Cookie=0 action=End source=ntap-cl1-svmcifs:v_ukliss_cifs_sclive_t4r2mv destination=ukliss-ntap-cl1-svmcifs_DR:mv_ukliss_cifs_sclive_t4r2 status=Success Note=Scheduled update operation is queued. Success but queued? Just to see if it was an issue with the upgrade I have created multiple test relationships; Local Aggr1 to aggr2 In reverse Between the 2552 & the 8040's Between the 8040's but getting the same issue; [cid:image012.jpg@01D2A61B.D7A63ED0] Indeed at this point I have come to the same conclusion that we have hit the bug: 1042534 http://mysupport.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=1042534 [cid:image006.png@01D2A611.FF15CF80] [cid:image007.png@01D2A611.FF15CF80] Has anyone else experienced this? Or can offer up some useful advice as to our next steps. Currently this system is only partially in production as a mv target, so the email spam is not too bad, but we plan to add many relationships very soon. Kind Regards, Chris.

3 2

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

toasters April 2017