Hi All,
I hit the following bug on one of the filer (FAS3260) I manage:
http://mysupport.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=659544
This filer (filer-prod-204) works in HA mode with filer-prod-203. They are connected to two redundant FC SAN fabrics (one connection from each filer per fabric). There are more HA pairs connected to the same fabrics, eg filer-prod-201 / filer-prod-202. All of the filers we have are running in 'single-image' mode. We run FC SAN fabrics in "hard zoning mode".
NetApp support conclusion was to replace motherboard on the filer and we proceeded with that.
Here is an issue we had and I have no explanation to that, I hope you guys can help me with that:
Once the filer-prod-204 got the motherboard replaced, powered on and entered HW diagnostics mode I've seen the messages as below *on every other filer* (eg. filer-prod-201), connected to the same fabric, causing issues on hosts (CentOS 6.4 mainly) attached to them:
Fri Jan 30 20:07:45 CET [filer-prod-201: scsitarget.ispfct.targetReset:notice]: FCP Target 0c: Target was Reset by the Initiator at Port Id: 0x11000 (WWPN 5001438021e071ec) Fri Jan 30 20:07:46 CET [filer-prod-201: scsitarget.ispfct.targetReset:notice]: FCP Target 0c: Target was Reset by the Initiator at Port Id: 0x10200 (WWPN 50014380186abac4) ...
Fri Jan 30 20:08:14 CET [filer-prod-201: scsitarget.ispfct.portLogin:notice]: FCP login on Fibre Channel adapter '0c' from '50:01:43:80:21:e0:71:ec', address 0x11000. Fri Jan 30 20:08:14 CET [filer-prod-201: scsitarget.ispfct.portLogin:notice]: FCP login on Fibre Channel adapter '0c' from '50:01:43:80:18:6a:ba:c4', address 0x10200.
So every single initiator on the filer *not involved* in the maintenance were reset, then tried to login back, reset again and it looped like that until I disabled filer-prod-204's target ports on the FC switches. Once the filer-prod-204 booted up with OnTAP, the issue was gone. I know it because when I tried to re-enabled the filer-prod-204's target ports, I didn't see any message like above and everything is running fine since then.
Does anyone have an idea what was happing here and why?
Cheers, Vladimir