Last week we had a fiber channel F760 completely crash and burn. Right now the system is in a state where it will serve data for 10 or 15 minutes and then it reports inconsistent parity errors and crashes. We will be sending it to Netapp for failure analysis.
That was a nightmare to say the least - and, of course, it happened while i was on vacation. (My first thought was that somebody was mad at me for flaming about the disk firmware thing and decided to take it out on me personally - yikes - i swear off flaming for the rest of my life). Anyway, last night another of our filers crashed with the same type of errors. This is an F630 with FC disks. I called NetApp - just got off the phone with them - they are looking into it, but before i lose another filer i thought i'd bounce this one off of the user community.
These are the types of errors it spits out:
Sun Feb 21 04:22:00 MST [isp2100_main]: isp2100_error_proc: dev 8.11 (0.11) data underrun (cmd opcode 0x28) ( 0x0 0x5c28 0x8000 ) Sun Feb 21 04:22:00 MST [isp2100_main]: Disk 8.11: Data underrun (0xfffffc0000c204f0,0x28,0)
a few of those then a few of these:
Sun Feb 21 04:22:00 MST [raid_stripe_owner]: Inconsistent parity on volume vol0, RAID group 2, stripe #377824. Sun Feb 21 04:22:00 MST [raid_stripe_owner]: Rewriting bad parity block on volume vol0, RAID group 2, stripe #377824.
Then:
Sun Feb 21 04:23:22 MST [edm_admin]: No valid paths to Enclosure Services in shelf 2 on ha 8. Sun Feb 21 04:23:22 MST [raid_stripe_owner]: Inconsistent parity on volume vol0, RAID group 2, stripe #377861. Sun Feb 21 04:23:22 MST [raid_stripe_owner]: Out of messages; cannot dump contents of inconsistent stripe. Sun Feb 21 04:24:00 MST [edm_admin]: No valid paths to Enclosure Services in shelf 3 on ha 8. Sun Feb 21 11:30:36 GMT [rc]: de_main: e10 : Link up. Sun Feb 21 04:30:37 MST [rc]: saving 44M to /etc/crash/core.0.nz ("wafl_check_vbns: vbn too big")
Netapp thinks the previous failure may have been due to heat in the computer room - BUT - the filer *never* spit out any thermal warnings - I'm not too sure if i believe that heat was the culprit - i can run it for hours doing disk scrubs, but as soon as i use it for NIS it crashes. This system is in a different room and has a beter ventilated cabinet - again, no thermal warnings in the messages file - but maybe the thermal warning messages don't work? Just out of curiosity - has anyone ever seen a FC system spit out a thermal warning?? (5.1.2P2)
Has anyone experienced these types of problem with FC filers? I've never had a problem with SCSI filers - but these FC ones just seem flaky to me....
Thanks for any insight anyone can lend...
Graham