Yes - we are taking the stance of requesting more informationWe'd be more confident recommending the drastic action of replacing all HW if Netapp could point to the specific HW issue(s) in the current HW and demonstrate how its fixed in newer HW revsCurrently those details are not public/forthcoming…Without the full picture we fear the very considerable effort of the HW replacement plan work could be wasted if the issue then re-occurred - we are asking Netapp to make the technical details available to us to raise confidence in the HW replacement planthanks
On Jan 2, 2013, at 3:54 PM, "Jayanathan, David" <djayan@qualcomm.com> wrote:Interesting that you were advised to replace all HW/cards. We’ve hit this three times in our environment that I know of and all times I was provided with the following information:Bug Number / Title:519766 / FAS32xx Uncorrectable Machine Check ErrorProblem Summary:The storage controller suffered an interruption of service due to an uncorrectable machine check error. The source of the error has not been indicated.Recommended Solution/Workaround:If this is the first occurrence:- Update BIOS FAS3200: 5.1.1 or later.- Update SP firmware to 1.2.3 or later.- Update Data ONTAP to 8.0.2P4 or later.- Restart the system and monitor for any repeats.If this is the second occurrence:- Replace the motherboard.- Mark the faulty hardware for RCA under bug 519766.I have yet to find any references in mail of hitting this bug on any of our systems running 8.0.2P7 or above. Two of the times we hit it were on the same system, so on the 2ndtime we upgraded to 8.0.2P4 and performed a motherboard swap. The other time we purely did a failback and opted out of doing a code upgrade.Thanks,DavidFrom: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Fletcher Cocquyt
Sent: Wednesday, January 02, 2013 3:14 PM
To: Doug Siggins
Cc: netapp-users@mailman.stanford.edu; toasters@teaparty.net Lists
Subject: Re: BURT 519766 panic on production 3270Dec 25 04:19:26 na03.GoCardinal.EDU Dec 25 12:20:49 [na03:mgr.stack.string:notice]: Panic string: Uncorrectable Machine Check Error at CPU1. MC5 Error: STATUS<0xb200001080200e0f>(Val,UnCor,Enable,PCC,ErrCode(Gen,NTO,Gen,Gen,Gen)); PLX PCI-E switch on IO Expthanks - yes the fact Netapp is immediately willing to replace most of our HW indicates they know its an issue with our current HWI'd feel better recommending this plan if they could point to the specific HW issue in the current HW and demonstrate how its fixed in newer HW revsCurrently those details are not public/forthcoming...On Jan 2, 2013, at 3:02 PM, Doug Siggins <DSiggins@ma.maileig.com> wrote:Fletcher,What was the panic string? Did you get a core to netapp? Sometimes support is a bit reluctant to investigate further unless you press for a real answer. After 2-3 core dumps with the same type panic string, I start demanding a fix whether it be hardware or software.Here are two forum posts:I had a similar issue on an older filer. It panic'd 2-3 times. Luckily, over time has we prepare to retire the system the load has dropped significantly, and I haven't seen the NMI panic for 6+ months. I had suggested we replace the system immediately and migrate off.I guess the rule of thumb is that if you see the panic more than once, you should definitely think about hardware replacements.From: toasters-bounces@teaparty.net [toasters-bounces@teaparty.net] on behalf of Fletcher Cocquyt [fcocquyt@stanford.edu]
Sent: Wednesday, January 02, 2013 5:21 PM
To: toasters@teaparty.net Lists
Cc: netapp-users@mailman.stanford.edu
Subject: BURT 519766 panic on production 3270Happy 2013!One of our production 3270 heads panic'ed and rebooted 3:30 am Dec 25 - lump of coal ?The good news is, when our system panic'ed and rebooted, the failover performed as expected so we had only a 2 second timeout logged on our ESXi hosts, Oracle - no downtime.There is scarce public info on this issue and Netapp is recommending options from "do nothing - (its rare and may never happen again)" to "replace motherboards and all cards"Our 3270 clusters (we have 2 in Active:Standby mode) have been stable since we installed them in Feb 2011. We are on 8.1GA - Netapp support says the issue is independent of OnTAP version.Anyone else encountered this issue?What was your action and outcome?thanks,Fletcher CocquytStanford University School of Medicine
_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters