The root problem here is nodes not part of the cluster getting resets from linux hosts. I'm not a low level scsi expert, but once we had a problem that resulted in resets being sent and causing issues, and I think I remember hearing that they affect the entire zone. Meaning everything that can "see" the initiator will be told to reset.

It's nondisruptive to change from your zoning setup to a more optimal one where each zone contains a single initiator and a single target. Also, you mentioned "hard" zoning- did you mean that literally, like your zones have physical port locations in them?

On Mon, Feb 2, 2015 at 9:10 AM, Momonth <momonth@gmail.com> wrote:

On Mon, Feb 2, 2015 at 1:03 PM, Borzenkov, Andrei
<andrei.borzenkov@ts.fujitsu.com> wrote:
> My best guess is that filer ports were configured as initiator by default and somehow conflicted with host HBAs (filer will try to use LUNs is found as disks). Do you use two port zones on fan-out (single initiator - multiple targets)? Note that motherboard replacement procedure recommends unconnecting ports until they are properly configured.
>

Due to "historical reasons" our zones are "two initiators, multiple
targets", i know it's sub-optimal, but that's the way it is. Such
zones always worked with controlled failovers, OnTAP upgrades etc.

When the NetApp technician arrived, I specifically asked him if it
would be the best to disable respective ports on the fabrics for the
filer in question (as I bet I saw this behaviour already once), but
the answer was "no, it sould not affect anything".

_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters