toasters

toasters@lists.teaparty.net

2 participants
13529 discussions

RE: Network Appliance Upgrade to ONTAP 5.3.4R3P2 and Backups
by Moshe Linzer 04 May '00

04 May '00

Budtool has certified Budtool 4.6.1a in combination with OnTap 5.3.5. Unfortunately, NetApp says that 5.3.5 is buggy and recommends a patched release, which is not certified by Budtool. 5.3.5P2 did not work for me, because of the NDMP problems referred to in many emails (Budtool could not connect to NDMP on filer). 5.3.5R2P2 is SUPPOSED to fix those problems, and I will be testing that release today. Moshe >From owner-toasters(a)mathworks.com Tue May 2 21:28:08 2000 >Date: Tue, 2 May 2000 11:21:02 -0500 (CDT) >From: Jay Orr <orrjl(a)stl.nexen.com> >To: toasters(a)mathworks.com >Subject: RE: Network Appliance Upgrade to ONTAP 5.3.4R3P2 and Backups >MIME-Version: 1.0 >Precedence: bulk > > >Anyone know if budtool is keeping up? I've been pondering about upgrading >our F330's to 5.3.5r2 ..... > >On Tue, 2 May 2000, Kelly Wyatt wrote: > >> Keep in mind that VERITAS currently has no plans to qualify any 5.3.5 release. If you are concerned about keeping NetBackup and ONTAP in a supported mode you need to be at 5.3.4 (qualified). (They are currently qualifying 5.3.4R2 per email I received from VERITAS support. >> >> Kelly >> >> >> -- >> >> Kelly Wyatt, Kelly.Wyatt(a)SAS.com >> Systems Programmer >> Integrated Solutions Consulting >> SAS Institute Inc. / SAS Campus Drive / Cary, NC 27513 >> http://www.sas.com >> The power to know TM >> >> >> >> >> -----Original Message----- >> From: Micheler Klaus (MDCA Villach) [mailto:Klaus.Micheler@infineon.com] >> Sent: Tuesday, May 02, 2000 2:10 AM >> To: 'toasters(a)mathworks.com'; 'Kelly.Wyatt(a)sas.com'; 'armijo(a)cs.unm.edu' >> Subject: RE: Network Appliance Upgrade to ONTAP 5.3.4R3P2 and Backups >> >> >> >> On Thu, Apr 20, 2000 at 11:37:35AM -0400, Kelly Wyatt wrote: >> >> We upgraded out filers to NetApp Release 5.3.4R3P2 this weekend. Since >> then, NBU 3.2 keeps downing the tape drives on >> several, but not all of the filers (3 of 9). This seems to happen before, >> during and after backups. >> >> >> >> Has anyone else seen this? Thanks!! >> >> >we saw this on previous versions of NBU, 3.1.*, the work around was really >> >> >lame. cron executed a script which did something like: vmoprcmd -up # for >> >each drive. >> >> Hi there! >> >> We also have the same problem with this release of Ontap and Netbackup v3.2. >> According to Netapp the cause of this problem is a faulty Java Class >> (Garbage collection) which NDMP uses. >> This should also be the reason for the problems with Ontap 5.3.5xxx, where >> the problem got critical and backup stopped working. >> >> Netapp recommended us to switch to Ontap 5.3.5R2P1, where the NDMP-BUG is >> (hopefully) fixed. >> >> regards, Klaus Micheler >> Infineon Technologies DC >> > >----------- >Jay Orr >Systems Administrator >Fujitsu Nexion Inc. >St. Louis, MO > ----------------------------------------------------------------------------- Moshe Linzer | Mastery of Unix offers real freedom. Unix Systems Manager | The price of freedom is always dear, but National Semiconductor, Israel | I'd rather pay for my freedom than live in Phone: 972-9-970-2247 | a bitmapped, pop-up-happy dungeon like NT. Fax: 972-9-970-2001 | Email: moshel(a)nsc.com | - Thomas Scoville, Unix Review -----------------------------------------------------------------------------

1 0

RE: Data OnTap 5.3.5P2 and NDMP Backups
by Linn, Greg 04 May '00

04 May '00

Paul, I doubt that the 5.3.5P2 NDMP problem could account for the incomplete tape eject condition you described. Symptoms of the 5.3.5P2 problem are that NDMP on the filer deadlocks and stops responding to network requests from the backup application. This condition can be verified by attempting to telnet to the NDMP port (10K) on the filer. -Greg -----Original Message----- From: Paul Mikulencak [mailto:paul.mikulencak@amd.com] Sent: Wednesday, May 03, 2000 3:17 PM To: Linn, Greg Cc: 'toasters(a)mathworks.com' Subject: Re: Data OnTap 5.3.5P2 and NDMP Backups On Wed, 3 May 2000, Linn, Greg wrote: > A NDMP-based backup problem has been identified in our Data OnTap 5.3.5P2 patch > release. > > During development of Data OnTap 5.3.6, a deadlock condition was discovered > which impacted the NDMP Java implementation. This condition typically manifests > itself as an NDMP connection failure during resource depletion conditions. We > believe it was first introduced 5.3.5P2. The problem has been resolved in > 5.3.5R2P2 and in the forthcoming 5.3.6 release. > > If you are running 5.3.5P2, and are experiencing NDMP backup problems similar to > the connection failure described above, we recommend that you upgrade to > 5.3.5R2P2. greg (and all), i'd appreciate a pointer to a description of the "problem" in detail. we're wrestling with something here that i've been assuming was a hardware problem on our storagetek 9740 tape changer, but which hasn't responded to anything i or the storagetek field engineer have tried. our backups (budtool, 4.6.1, with the jumbo patch, dlt7000 drives scsi attached to each F760 filer) have been failing because fully written tapes are incompletely ejected from their drives. nothing in the messages file on either the budtool host or the netapps in question. tape drives have been replaced (for one filer, three times), and we even replaced the robotic hand. manual loading and unloading of tapes via the storagetek touch panel work without failure. sysconfig -t always shows the drives it should, and budtool always sees them. i have a hard time laying this at the feet of the particular revision of data ontap we're running (currently 5.3.5R2P1, formerly 5.3.4R2), but most of the hardware angles have been explored. i've been reading messages on the toasters list that indicates that a bunch of folks are having problems with their backups, but nothing quite like what we're experiencing. anyone have any feedback? pablo email paul.mikulencak(a)amd.com ext. 602-2448 "the word pessimism bothers me pager 624-0929 because it is often used instead cube 3.277.042 of the word lucidity." robert bresson

1 0

Veritas Netbackup woes?
by jmiddlebrooks＠datalink.com 04 May '00

04 May '00

Anybody seen Netbackup Status codes 99 when using Veritas Netbackup NDMP to back up your filer? I am consistantly getting Status code 99. I have a HP master server and I am running Ontap 5.3.4r2 on my F760 which has four drives attached to it, two off the onboard SCSI port and two off a second SCSI card in the Filer. What path do you use within Netbackup and are there any special attributes I should use? Anything special to put in the bp.conf? What is the best version of Ontap to run that has no Netbackup bugs? I have installed and used NDMP with a SUN master server several times and not experienced any problems. This is my only experience with a HP master and I am having no luck getting backups to be successfull all the time. I know these sound like Veritas questions but I was just curious as to what actual users are doing with success. Thanks in advance Jason Middlebrooks Systems Engineer Datalink Corp. 888-933-9327 x2970

1 0

Re: raid failure
by Robert L. Millner 03 May '00

03 May '00

"AT" == Aiello, Tony <Tony.Aiello(a)netapp.com>: "GDG" == G D Geen <geen(a)msp.sc.ti.com>: AT> As of the 5.3.2 release we added functionality to reassign blocks as they AT> occur. Prior to that we enabled automatic reassignment features on the disk AT> drives to do this. We found that the disks did not handle the reassignment AT> in all cases we'd like so we took control of that function. That would be The fact that whatever version this customer was running was not able to handle that common type of disk failure scares the hell out of me for what 5.3.5R2P1 will not be able to handle and how it will bite me when something breaks. Why didn't this make it in to ONTAP before something as useless as the Java GUI (ok, thats a personal opinion; but try and convince me that its worth more than the ability to appropriately handle disk failures). Netapp had years of experience from other vendors to draw from when it was formed. Part of that experience should have been comprehensive knowledge of the kinds of failures people see in the field. GDG> First and foremost, I take complete responsibility for my filers. I did so GDG> in my message to my management which was forwarded up through VP level. Of course, and there's always more that can be done at everyone's end. There's a long list of things I'd like to both do and see Netapp do. Important problems that vendors missed in their own tests make me really worry for that vendor's Q/A and what a product will cost me later when something pathological happens (after all, that's part of the cost of running the box). Nobody gets everything right immediately and that includes Netapp. Nobody's server is perfect for my environment. Everything is a balance of costs and some of those costs are how well it works on its own. What they miss has alot to do with what I consider my mistake to have been. Was my mistake not buying an EMC or a Sun in the first place or was my mistake not noticing the autosupport message (for example)? GDG> Having said that, I do agree with you. This disk should have failed by the GDG> filer two weeks prior. We at TI are now pushing NetApp to be proactive in We have been discussing certain things we'd like to see in their product as well (in case any one is curious I can elaborate more here). GDG> I still prefer to look at the autosupport messages. There is so much that I GDG> glean from these. The information is not just what disks to watch out for Everything in the autosupport messages can be gleaned from commands you run on the filer and checking the logs. I'd be really interested in hearing what kinds of tools you are working on if you don't mind sharing that info. GDG> I cannot disclose what fifteen hours of down time cost Texas Instruments, GDG> Inc. as that is proprietary information though we were fortunate that this GDG> was over a holiday weekend. *shiver* Rob

2 1

Data OnTap 5.3.5P2 and NDMP Backups
by Linn, Greg 03 May '00

03 May '00

A NDMP-based backup problem has been identified in our Data OnTap 5.3.5P2 patch release. During development of Data OnTap 5.3.6, a deadlock condition was discovered which impacted the NDMP Java implementation. This condition typically manifests itself as an NDMP connection failure during resource depletion conditions. We believe it was first introduced 5.3.5P2. The problem has been resolved in 5.3.5R2P2 and in the forthcoming 5.3.6 release. If you are running 5.3.5P2, and are experiencing NDMP backup problems similar to the connection failure described above, we recommend that you upgrade to 5.3.5R2P2. Greg Linn Manager, NDMP Development linn(a)netapp.com 408.822.3752 telephone 408.822.4457 fax

2 1

Re: raid failure
by Robert L. Millner 03 May '00

03 May '00

SL> filter out those hourly status messages. There's no way I can wade SL> through those weekly emails because /etc/messages is usually about SL> 5000 lines long. I'd probably forget to check the logs by hand, but Swatch is another really good tool for doing this. It can be used to compact a large number of entries into a summary and weed out useless information. Rob

1 0

RE: raid failure
by Aiello, Tony 03 May '00

03 May '00

Hello, I don't see the reference to the version of OnTap used but perhaps I can relate some information. As of the 5.3.2 release we added functionality to reassign blocks as they occur. Prior to that we enabled automatic reassignment features on the disk drives to do this. We found that the disks did not handle the reassignment in all cases we'd like so we took control of that function. That would be why you could see multiple reports of bad blocks showing up in subsequent scrubs. The disk did not do the reassignment and so this bad spot was left on the media. As of 5.3.2 messages would appear to the effect of: Sun Apr 30 04:38:50 MDT [isp2100_main]: Disk 5.14: sector 33601609 will be reassigned Reassignment means the device uses a different piece of media to store information for some block address. Not all errors returned from a disk can be handled by a block reassignment - really only those that come back as unrecoverable media errors can you repair by performing a block level reassignment. Should the reassigment fail for some reason then the disk is failed as sector-wise errors can lead to large reliability issues. Tony -------------------------- Tony Aiello, Mgr. Storage Software mailto:taiello@netapp.com Ph:(408)822-6515 > -----Original Message----- > From: Robert L. Millner [mailto:rmillner@transmeta.com] > Sent: Wednesday, May 03, 2000 10:08 AM > To: toasters > Subject: Re: raid failure > > > Hey, > > GDG> autosupport messages too. As I went back through the autosupport > GDG> logs that are e-mailed to me each week, I found that the problem > GDG> began approximately two weeks earlier. Every time a the > disk tried > GDG> to read a particular sector of the disk, an error messages would > GDG> appear the messages log indicating such an event had > occurred. Had > GDG> I not been busily working other issues, to the detriment of my > GDG> filers, I would have failed this disk at lease a week prior. > > > My immediate question to Netapp in this case would be why was the > periodic disk scrubbing not sufficient to cause the failed > sectors to be > replaced (this was going on for two weeks)? Why upon detection of the > block failure (after all, if a log message is generated, then > the filer > knows it happened) was the data not immediately reconstructed > elsewhere > and the disk blocks marked as unusable? A block sized RAID > reconstruction and re-write should be a trivial problem for > the filer to > solve. This is the kind of detail I'd expect a storage > vendor to place > a much higher priority on than having a java GUI. This is a > well known > way that disks fail; not some mysterious voodoo issue. I worry about > what other well known failure modes were left out till a later release > of ONTAP. > > > GDG> occurred. Had I not been busily working other issues, to the > GDG> detriment of my filers, I would have failed this disk at lease > GDG> a week prior. > > Had Netapp not been busily working other issues, to the > detriment of you > and your user's time and data this disk would have failed > itself or the > filer would have taken some other corrective action on its own. You > should have your own automated methods for looking for > problems (like a > script which analyses the logs and reports problems back to > you). Don't > be afraid to turn into a nasty bastard in a situation like this. None > of my users would hesitate for a moment and that may be your last > recourse to making sure that people understand the priority of certain > kinds of issues. > > > I realize that I am being brutal to Netapp here but that kind > of failure > would cost us more than twice what we have invested in our > entire Netapp > infrastructure in time to rebuild the data. It gives me that cold, > prickly, paranoid feeling about all the data we have on our filers. I > also realize that there are other potential problems that would have > caused a dual disk failure in one raid group. This specific problem > should have been dealt with more gracefully by the filer on > its own. If > it didn't, then your case alone should have been enough to > put it on the > 'Must Fix This Immediately!' list. > > > > Rob > > "You're just the little bundle of negative reinforcement I've > been looking for." -Mr. Gone >

2 1

Re: raid failure
by Jay Orr 03 May '00

03 May '00

On 3 May 2000, Mark D Fowle wrote: > I have heard a few horror stories lately about netapps and multi-disk raid > failures. Has anyone out there experienced this > and what did you do for recovery ? Where there any warnings? I have not had > this happen and would like to do as much > as possible to prevent it. I'll say this much for these Hardy Beasts - we had our A/C die on us overnight once, and I came in to find our filer dead. We had to swap out ALL the parts on the filer to bring it back up (it's a F330 we've had a few years and the room was 90+ degrees). However, didn't loose a drive! Knock on wood, we've never had a two-disk failure. To me, this illustrates that the chances of a double drive failure are quite low. Also, I'm always look at the drive lights as I walk by to make sure I didn't miss a log message about a drive failure. my $0.02... ----------- Jay Orr Systems Administrator Fujitsu Nexion Inc. St. Louis, MO

2 1

Re: raid failure
by Robert L. Millner 03 May '00

03 May '00

Hey, GDG> autosupport messages too. As I went back through the autosupport GDG> logs that are e-mailed to me each week, I found that the problem GDG> began approximately two weeks earlier. Every time a the disk tried GDG> to read a particular sector of the disk, an error messages would GDG> appear the messages log indicating such an event had occurred. Had GDG> I not been busily working other issues, to the detriment of my GDG> filers, I would have failed this disk at lease a week prior. My immediate question to Netapp in this case would be why was the periodic disk scrubbing not sufficient to cause the failed sectors to be replaced (this was going on for two weeks)? Why upon detection of the block failure (after all, if a log message is generated, then the filer knows it happened) was the data not immediately reconstructed elsewhere and the disk blocks marked as unusable? A block sized RAID reconstruction and re-write should be a trivial problem for the filer to solve. This is the kind of detail I'd expect a storage vendor to place a much higher priority on than having a java GUI. This is a well known way that disks fail; not some mysterious voodoo issue. I worry about what other well known failure modes were left out till a later release of ONTAP. GDG> occurred. Had I not been busily working other issues, to the GDG> detriment of my filers, I would have failed this disk at lease GDG> a week prior. Had Netapp not been busily working other issues, to the detriment of you and your user's time and data this disk would have failed itself or the filer would have taken some other corrective action on its own. You should have your own automated methods for looking for problems (like a script which analyses the logs and reports problems back to you). Don't be afraid to turn into a nasty bastard in a situation like this. None of my users would hesitate for a moment and that may be your last recourse to making sure that people understand the priority of certain kinds of issues. I realize that I am being brutal to Netapp here but that kind of failure would cost us more than twice what we have invested in our entire Netapp infrastructure in time to rebuild the data. It gives me that cold, prickly, paranoid feeling about all the data we have on our filers. I also realize that there are other potential problems that would have caused a dual disk failure in one raid group. This specific problem should have been dealt with more gracefully by the filer on its own. If it didn't, then your case alone should have been enough to put it on the 'Must Fix This Immediately!' list. Rob "You're just the little bundle of negative reinforcement I've been looking for." -Mr. Gone

2 1

RE: raid failure
by Walters, Mike 03 May '00

03 May '00

Just another tool for your armoury against the unthinkable: SnapMirror. This gives you the ability to keep an asynchronous copy of your volumes on another filer with little overhead. I won't bore you with detail (which you may already know), but you might want to have a scan down http://www.netapp.com/tech_library/3066.html for data protection strategies. Cheers Mike > -----Original Message----- > From: owner-dl-toasters(a)netapp.com > [mailto:owner-dl-toasters@netapp.com]On Behalf Of Mark D Fowle > Sent: 03 May 2000 12:00 > To: toasters > Subject: raid failure > > > I have heard a few horror stories lately about netapps and > multi-disk raid > failures. Has anyone out there experienced this > and what did you do for recovery ? Where there any > warnings? I have not had > this happen and would like to do as much > as possible to prevent it. > > Thanks, > ============================================================ > =================== > ======= > Mark Fowle > Caterpillar/BCP > Cary North Carolina > ============================================================ > =================== > ======= >

1 0

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

toasters