Hi Alexander,

No, unfortunately cf status does not show this issue.

It only appears when the takeover is attempted (we have tried this 3 times now with the same results)

Errors;

00000016.0000e424 01ca8287 Sat Mar 18 2017 08:45:50 +00:00 [disk.reserveFailed:error] Disk reservation failed on 3a.25.1 CDB 0x5f:0001 - SCSI:illegal request (5 55 4)

00000016.0000e425 01ca8287 Sat Mar 18 2017 08:45:50 +00:00 [disk.reserveFailed:error] Disk reservation failed on 3a.25.5 CDB 0x5f:0001 - SCSI:illegal request (5 55 4)

00000016.0000e426 01ca8287 Sat Mar 18 2017 08:45:50 +00:00 [disk.reserveFailed:error] Disk reservation failed on 3a.25.3 CDB 0x5f:0001 - SCSI:illegal request (5 55 4)

00000016.0000e427 01ca8287 Sat Mar 18 2017 08:45:50 +00:00 [disk.reserveFailed:error] Disk reservation failed on 3a.25.2 CDB 0x5f:0001 - SCSI:illegal request (5 55 4)

Sat Mar 18 08:46:00 GMT [KEN:raid.assim.rg.missingChild:error]: Aggregate partner:aggr0, rgobj_verify: RAID object 0 has only 7 valid children, expected 17.

Sat Mar 18 08:46:00 GMT [KEN:raid.assim.rg.missingChild:error]: Aggregate partner:aggr0, rgobj_verify: RAID object 2 has only 5 valid children, expected 17.

Sat Mar 18 08:46:00 GMT [KEN:raid.assim.rg.missingChild:error]: Aggregate partner:aggr0, rgobj_verify: RAID object 1 has only 12 valid children, expected 17.

Sat Mar 18 08:46:00 GMT [KEN:raid.assim.rg.missingChild:error]: Aggregate partner:aggr0, rgobj_verify: RAID object 3 has only 8 valid children, expected 17.

Sat Mar 18 08:46:15 GMT [BARBIE:ha.takeoverImpNotDef:warning]: Takeover of the partner node is impossible due to reason waiting for partner to recover.

Aggr scrub by default is only configured to run for 10 hours every Sunday @ 1am.

We changed this to run continuously in order to complete the scrubs. (once the reconstruction had completed and the disk FW had been upgraded)

Kind Regards,

Chris.

From: Alexander Griesser [mailto:AGriesser@anexia-it.com]
Sent: 03 April 2017 11:08
To: Chris Hague; toasters@teaparty.net
Subject: AW: RAID Reconstruction after ONTAP upgrade?

I was running shelf, ACP and disk firmware upgrades prior to the upgrade and I also installed the new disk qualification package as recommended.

Here’s the output of my scrub status for the affected aggregate:

::> aggr scrub -action status -aggregate blabla_data

Raid Group:/blabla_data/plex0/rg0, Is Suspended:false, Last Scrub:Sun Apr 2 04:24:20 2017

Raid Group:/blabla_data/plex0/rg1, Is Suspended:true, Last Scrub:Sun Mar 19 06:24:32 2017

, Percentage Completed:38%

Raid Group:/blabla_data/plex0/rg2, Is Suspended:true, Percentage Completed:40%

So, I guess I’ll leave that running and for some time now before I try another takeover.

How did you check for the disk reservation and missing raid group child object errors? Does `cf status` on this system tell you that a takeover is not possible due to this issue or does it only tell you when you try to run a takeover?

Best,

Alexander Griesser

Head of Systems Operations

ANEXIA Internetdienstleistungs GmbH

E-Mail: AGriesser@anexia-it.com

Web: http://www.anexia-it.com

Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt

Geschäftsführer: Alexander Windbichler

Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601

Von: Chris Hague [mailto:Chris_Hague@ajg.com]
Gesendet: Montag, 3. April 2017 12:02
An: Alexander Griesser <AGriesser@anexia-it.com>; toasters@teaparty.net
Betreff: RE: RAID Reconstruction after ONTAP upgrade?

NetApp said upgrade the Disk FW and allow an aggr scrub to complete (when we looked at the aggr scrub status –v, some of the scrubs hadn’t completed for over a year, others had never completed!)

As an aside, we have another issue with the same system - disk reservation error & missing raid group child objects which are also preventing a graceful takeover. This is a rare bug which requires an ontap upgrade, but as we cannot gracefully takeover we are awaiting an outage window to perform a DU.

Kind Regards,

Chris.

From: Alexander Griesser [mailto:AGriesser@anexia-it.com]
Sent: 03 April 2017 10:57
To: Chris Hague; toasters@teaparty.net
Subject: AW: RAID Reconstruction after ONTAP upgrade?

Did you ever find out the reason for this?

Alexander Griesser

Head of Systems Operations

ANEXIA Internetdienstleistungs GmbH

E-Mail: AGriesser@anexia-it.com

Web: http://www.anexia-it.com

Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt

Geschäftsführer: Alexander Windbichler

Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601

Von: Chris Hague [mailto:Chris_Hague@ajg.com]
Gesendet: Montag, 3. April 2017 11:56
An: Alexander Griesser <AGriesser@anexia-it.com>; toasters@teaparty.net
Betreff: RE: RAID Reconstruction after ONTAP upgrade?

Hi Alexander,

We have seen this on 7-Mode following a cf takeover & giveback. (FAS3250)

Same output as you and no disks failed before or after this procedure.

Kind Regards,

Chris.

From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Alexander Griesser
Sent: 02 April 2017 19:08
To: toasters@teaparty.net
Subject: RAID Reconstruction after ONTAP upgrade?

Hi,

has anyone ever seen a RAID reconstruct happening immediately after an OnTap upgrade?

I just upgraded one of my older filers to 8.3.1P2 and it is now running a reconstruction on one of its aggregates for (at least to me) no obvious reason.

During the boot up of this controller after the upgrade, I saw the following message on the console which did not show up on the second controller:

Creating trace file /etc/log/rastrace/RAID_0_20170402_17:18:00:095890.dmp

No disks show as broken, or in maintenance mode, or anything like that – so any hints would be welcome.

Here’s the output of `aggr status –r` on this controller:

RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)

--------- ------ ------------- ---- ---- ---- ----- -------------- --------------

dparity 0b.22.23 0b 22 23 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816

parity 0b.22.4 0b 22 4 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816

data 2a.21.5 2a 21 5 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816

data 0b.22.5 0b 22 5 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816

data 2a.21.6 2a 21 6 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816

data 0b.22.6 0b 22 6 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816

data 2a.21.7 2a 21 7 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816

data 0b.22.7 0b 22 7 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 (reconstruction 3% completed)

data 2a.21.8 2a 21 8 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816

data 0b.22.8 0b 22 8 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816

data 2a.21.9 2a 21 9 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816

data 0b.22.9 0b 22 9 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816

data 2a.21.10 2a 21 10 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816

data 0b.22.10 0b 22 10 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816

data 2a.21.11 2a 21 11 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816

data 0b.22.11 0b 22 11 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816

data 2a.21.12 2a 21 12 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816

data 0b.22.12 0b 22 12 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816

data 2a.21.13 2a 21 13 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816

Thanks,

Alexander Griesser

Head of Systems Operations

ANEXIA Internetdienstleistungs GmbH

E-Mail: AGriesser@anexia-it.com

Web: http://www.anexia-it.com

Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt

Geschäftsführer: Alexander Windbichler

Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601