It could be that disks have SCSI reservation set on them that prevents partner from accessing them. At least, symptoms do match exactly. I believe there are diagnostic commands to display reservations but I do not have them handy.

 

I would try to

 

a)      Make sure you have up to date qual_devices on both nodes; update if needed

b)      halt -f to prevent takeover - halt both partners

c)       boot in maintenance mode on both

d)      perform “storage release disks” on both nodes

e)      reboot both nodes

 

What NetApp support says?

 

 

From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Jeff Cleverley
Sent: Tuesday, March 10, 2015 2:42 AM
To: <Toasters@teaparty.net>
Subject: Shelf recognition problem with DS14s

 

Greetings,

 

My apologies for the long email but I'm trying to put in as many details as possible.

 

I have some DS14 shelves attached to some 6290s running 8.1.2P4 7-mode.  I've run into an odd problem that I think will require a takeover/giveback, but wanted to see if I've missed anything.  I'm pretty sure the problem is not the shelf IDs :-)

 

Controller A sees all disks and shelves through the loop just fine.  Controller B sees both paths, but for the shelf number and bays, it only shows question marks for those.  All connections are on PCI HBAs, nothing is using the onboard ports.

 

Here is an example of the storage show disk -p command from Controller B:

 

9d.82      B    8d.82      A     ?    ?  

 

Here is the same path on controller B using a sysconfig -a:

       82  :                                  0.0GB 0B/sect (Startup failed.)

 

As you can see, it sees device 82 down both paths, but it can't identify the shelf or the drive bay.  When I look on controller A, it has valid information.  If I at a disk from controller B, it won't see any information on the disk like serial number, disk size, firmware, etc.  Controller A sees everything just fine and does not have any Startup failed messages.

 

If I do a storage show disk -n, controller A sees 84 drives, but controller B only sees 4.  Those 4 don't show correctly either.  If I manually pull a drive and reseat it, it will show up as an unowned drive with no valid shelf or bay slot.  It will show up correctly in the sysconfig -a output.  

 

Here are things I've tried with no luck.

 

1.  New FC ports on the filer side.

2.  Reseating the shelf interface cards on both shelves 1 and 6 where the FC connections come in.

3.  New fibre cable between shelf and filer.

4.  Swapped controller A & B connection in shelf #1.  Controller A sees everything correctly down the same path that controller B doesn't.  Controller B doesn't see anything correctly down the path controller A was using and seeing everything.

5.  Manually pulled a drive from a shelf and plugged it back in.  

6.  Verified I am not at any maximum number of drives or shelf count per cluster limit.

7.  Verified both heads have the same shelf/drive firmware and qual_devices packages.

 

The only thing I can think of is somehow the registry or something on controller B has corrupted.   Since I have a head in an odd condition for this loop of disks only I'm a little reluctant to do a takeover and giveback.  What if it comes back and doesn't recognize any of its drives :-)

 

Thanks,

 

Jeff


 

--

Jeff Cleverley
IT Engineer

4380 Ziegler Road
Fort Collins, Colorado 80525
970-288-4611