On Mon, Mar 9, 2015 at 11:25 PM, Borzenkov, Andrei <andrei.borzenkov@ts.fujitsu.com> wrote:

It could be that disks have SCSI reservation set on them that prevents partner from accessing them. At least, symptoms do match exactly. I believe there are diagnostic commands to display reservations but I do not have them handy.


I'll look into it, but I doubt it.  All of these shelves were on a different cluster until ~9 months ago.  They were taken down and ownership removed in maintenance mode.  There were 7 other loops with 6 shelves each and all of those have been fine.  This one is fine for one filer but the other refuses to see them. 

 

I would try to

 

a)      Make sure you have up to date qual_devices on both nodes; update if needed

Already done. 

b)      halt -f to prevent takeover - halt both partners

Not going to happen anytime soon.  No downtime will be scheduled for several months at a minimum.   

c)       boot in maintenance mode on both

d)      perform “storage release disks” on both nodes

e)      reboot both nodes

 

What NetApp support says?


Similar type of thing.  They were focused more on the shelves.  They wanted me to pull every controller on every shelf in the loop.  I pulled shelf 1 and shelf 6 with no luck.  Since the other controller sees the same shelves through the same shelf interconnect cables, it tells me those are not the problem.  

In the mean time I've removed all disk ownership for every disk in that loop.  It will have to wait until I can get a downtime.

Thanks,

Jeff 

 

 

From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Jeff Cleverley
Sent: Tuesday, March 10, 2015 2:42 AM
To: <Toasters@teaparty.net>
Subject: Shelf recognition problem with DS14s

 

Greetings,

 

My apologies for the long email but I'm trying to put in as many details as possible.

 

I have some DS14 shelves attached to some 6290s running 8.1.2P4 7-mode.  I've run into an odd problem that I think will require a takeover/giveback, but wanted to see if I've missed anything.  I'm pretty sure the problem is not the shelf IDs :-)

 

Controller A sees all disks and shelves through the loop just fine.  Controller B sees both paths, but for the shelf number and bays, it only shows question marks for those.  All connections are on PCI HBAs, nothing is using the onboard ports.

 

Here is an example of the storage show disk -p command from Controller B:

 

9d.82      B    8d.82      A     ?    ?  

 

Here is the same path on controller B using a sysconfig -a:

       82  :                                  0.0GB 0B/sect (Startup failed.)

 

As you can see, it sees device 82 down both paths, but it can't identify the shelf or the drive bay.  When I look on controller A, it has valid information.  If I at a disk from controller B, it won't see any information on the disk like serial number, disk size, firmware, etc.  Controller A sees everything just fine and does not have any Startup failed messages.

 

If I do a storage show disk -n, controller A sees 84 drives, but controller B only sees 4.  Those 4 don't show correctly either.  If I manually pull a drive and reseat it, it will show up as an unowned drive with no valid shelf or bay slot.  It will show up correctly in the sysconfig -a output.  

 

Here are things I've tried with no luck.

 

1.  New FC ports on the filer side.

2.  Reseating the shelf interface cards on both shelves 1 and 6 where the FC connections come in.

3.  New fibre cable between shelf and filer.

4.  Swapped controller A & B connection in shelf #1.  Controller A sees everything correctly down the same path that controller B doesn't.  Controller B doesn't see anything correctly down the path controller A was using and seeing everything.

5.  Manually pulled a drive from a shelf and plugged it back in.  

6.  Verified I am not at any maximum number of drives or shelf count per cluster limit.

7.  Verified both heads have the same shelf/drive firmware and qual_devices packages.

 

The only thing I can think of is somehow the registry or something on controller B has corrupted.   Since I have a head in an odd condition for this loop of disks only I'm a little reluctant to do a takeover and giveback.  What if it comes back and doesn't recognize any of its drives :-)

 

Thanks,

 

Jeff


 

--

Jeff Cleverley
IT Engineer

4380 Ziegler Road
Fort Collins, Colorado 80525
970-288-4611




--
Jeff Cleverley
IT Engineer
4380 Ziegler Road
Fort Collins, Colorado 80525
970-288-4611