On Mon, Mar 9, 2015 at 11:25 PM, Borzenkov, Andrei < andrei.borzenkov@ts.fujitsu.com> wrote:
It could be that disks have SCSI reservation set on them that prevents partner from accessing them. At least, symptoms do match exactly. I believe there are diagnostic commands to display reservations but I do not have them handy.
I'll look into it, but I doubt it. All of these shelves were on a different cluster until ~9 months ago. They were taken down and ownership removed in maintenance mode. There were 7 other loops with 6 shelves each and all of those have been fine. This one is fine for one filer but the other refuses to see them.
I would try to
a) Make sure you have up to date qual_devices on both nodes; update if needed
Already done.
b) halt -f to prevent takeover - halt both partners
Not going to happen anytime soon. No downtime will be scheduled for several months at a minimum.
c) boot in maintenance mode on both
d) perform “storage release disks” on both nodes
e) reboot both nodes
What NetApp support says?
Similar type of thing. They were focused more on the shelves. They wanted me to pull every controller on every shelf in the loop. I pulled shelf 1 and shelf 6 with no luck. Since the other controller sees the same shelves through the same shelf interconnect cables, it tells me those are not the problem.
In the mean time I've removed all disk ownership for every disk in that loop. It will have to wait until I can get a downtime.
Thanks,
Jeff
*From:* toasters-bounces@teaparty.net [mailto: toasters-bounces@teaparty.net] *On Behalf Of *Jeff Cleverley *Sent:* Tuesday, March 10, 2015 2:42 AM *To:* Toasters@teaparty.net *Subject:* Shelf recognition problem with DS14s
Greetings,
My apologies for the long email but I'm trying to put in as many details as possible.
I have some DS14 shelves attached to some 6290s running 8.1.2P4 7-mode. I've run into an odd problem that I think will require a takeover/giveback, but wanted to see if I've missed anything. I'm pretty sure the problem is not the shelf IDs :-)
Controller A sees all disks and shelves through the loop just fine. Controller B sees both paths, but for the shelf number and bays, it only shows question marks for those. All connections are on PCI HBAs, nothing is using the onboard ports.
Here is an example of the storage show disk -p command from Controller B:
9d.82 B 8d.82 A ? ?
Here is the same path on controller B using a sysconfig -a:
82 : 0.0GB 0B/sect (Startup
failed.)
As you can see, it sees device 82 down both paths, but it can't identify the shelf or the drive bay. When I look on controller A, it has valid information. If I at a disk from controller B, it won't see any information on the disk like serial number, disk size, firmware, etc. Controller A sees everything just fine and does not have any Startup failed messages.
If I do a storage show disk -n, controller A sees 84 drives, but controller B only sees 4. Those 4 don't show correctly either. If I manually pull a drive and reseat it, it will show up as an unowned drive with no valid shelf or bay slot. It will show up correctly in the sysconfig -a output.
Here are things I've tried with no luck.
New FC ports on the filer side.
Reseating the shelf interface cards on both shelves 1 and 6 where the
FC connections come in.
New fibre cable between shelf and filer.
Swapped controller A & B connection in shelf #1. Controller A sees
everything correctly down the same path that controller B doesn't. Controller B doesn't see anything correctly down the path controller A was using and seeing everything.
Manually pulled a drive from a shelf and plugged it back in.
Verified I am not at any maximum number of drives or shelf count per
cluster limit.
- Verified both heads have the same shelf/drive firmware and
qual_devices packages.
The only thing I can think of is somehow the registry or something on controller B has corrupted. Since I have a head in an odd condition for this loop of disks only I'm a little reluctant to do a takeover and giveback. What if it comes back and doesn't recognize any of its drives :-)
Thanks,
Jeff
--
Jeff Cleverley IT Engineer
4380 Ziegler Road Fort Collins, Colorado 80525 970-288-4611