toasters November 1997

toasters@lists.teaparty.net

55 participants
56 discussions

Re: Solaris and NetApps
by sthaug＠nethelp.no 10 Nov '97

10 Nov '97

> One of my clients is complaining that performance between Solaris 2.5.1 > systems (mainly Ultra 1s and 3000s) and an F210 is pretty damn awful. (The > 210 is to be used for syslog, so the amount of disk writes isn't *that* > high). > > I got him to switch to NFS v2 and UDP to get around timeout issues, but > performance still sucks. We use "hard,intr,bg" for an NFS v2 mount with a NetApp connected to a 2.5.1 Ultra box with a crossover cable. Works like a charm for us, as a News spool. Steinar Haug, Nethelp consulting, sthaug(a)nethelp.no

4 5

Re: Solaris and NetApps
by mer＠world.evansville.net 10 Nov '97

10 Nov '97

On Nov 10, 12:30pm, Marc Nicholas wrote: > One of my clients is complaining that performance between Solaris 2.5.1 > systems (mainly Ultra 1s and 3000s) and an F210 is pretty damn awful. (The > 210 is to be used for syslog, so the amount of disk writes isn't *that* > high). > > I got him to switch to NFS v2 and UDP to get around timeout issues, but > performance still sucks. > > The entire LAN fabric is full duplex 100baseT switched through Cisco > Catalyst 5000s. Can I ask a dumb question? Is his 2.5.1 box running full duplex 100BT? We experienced the same symptom with an Ultra 2. We also noticed lots of late collisions on the Cisco EtherSwitch 1200 uplink port serving the Ultra. When we switched the port to half duplex, the problem went away. On a related topic, I spent a fruitless half hour trying to figure out how to configure our Ultra to do full duplex -- anyone know how to do it? -- Marc Rouleau VP and Chief Technology Officer Voice: (812) 479-1700 Fax: (812) 479-3439 Network WCS http://www.networkwcs.com

3 3

New IBM drives
by joe＠vpop.net 10 Nov '97

10 Nov '97

Does anyone know if the new Hi Capacity IBM drives will work with toasters? IBM to unveil disk technology: http://www.news.com/News/Item/0,4,16190,00.html?dtn.head Thanks, -joe ============================================================================= * NewsHub: Updated every 15 minutes/24 hours a day! * http://web.NewsHub.com/

1 0

Solaris and NetApps
by Marc Nicholas 10 Nov '97

10 Nov '97

One of my clients is complaining that performance between Solaris 2.5.1 systems (mainly Ultra 1s and 3000s) and an F210 is pretty damn awful. (The 210 is to be used for syslog, so the amount of disk writes isn't *that* high). I got him to switch to NFS v2 and UDP to get around timeout issues, but performance still sucks. The entire LAN fabric is full duplex 100baseT switched through Cisco Catalyst 5000s. Anyone got any ideas why things might be so sloooow? :-/ What mount options are the rest of you using with Solaris? Thanks, -marc --- Marc Nicholas - Hippocampus OSD Inc. 416 979 9000 - fax: 416 979 8223 - http://www.hippocampus.net 125 John St. - Suite #100 - Toronto - Ontario - M5V 2E2 - CANADA "Inter/Intra/Extra[net] consulting, services, hardware and software sales"

3 2

Re: Possible software cause for total data loss?
by sirbruce＠ix.netcom.com 08 Nov '97

08 Nov '97

On 11/08/97 01:39:24 you wrote: >> The fact that it continues to be the same machine leads me to think >> that swapping out the NVRAM card and/or the SCSI cards may be in >> order. > > I managed to capture the original kernel panic only once: "PANIC: >../common/wafl/nvlog.c: 1088: Assertion failure", although your guess >is as good as mine why all subsequent reboots panic in disk.c. The >suggestions I've received from Netapp support all involve getting a >core dump over to them for analysis. Unfortunately, the filer never >gets up to the point where it can do a savecore. :( Well, if the NVRAM error caused something bogus to be written to disk, creating bad metadata in the unclean shutdown, then this could trigger another bug in the disk code where it reads the disks and recomputes parity. Or the NVRAM itself may actually have caused the error. It could even be that the disk.c message is misleading and is really talking to the NVRAM at that point. Alternatively, the bad data on the disk could have been caused by bad SCSI, but I admit this is less likely. Frankly there's no way to know without seeing the code (even the best modular code can have subtle dependencies and side effects that make it appear the error is something other than where it is). I'm surprised support did not understand that your filer was actually DOWN, and thus you couldn't get the core. There are ways to boot from floppy to dump the NVRAM and fix the filesystem; I would press them for advice in that direction if they didn't understand your problem fully. In any case, like I said, if it consistently is happening to this machine you could replace the NVRAM card, re-install it all over again, and see if it happens again. Bruce

1 0

Re: Possible software cause for total data loss?
by sirbruce＠ix.netcom.com 08 Nov '97

08 Nov '97

On 11/07/97 18:43:45 you wrote: >On Tue, 4 Nov 1997 sirbruce(a)ix.netcom.com wrote: >> >> 1. Filer crashes while running - Reboot >> 2. Filer crashes replaying NVRAM - Reboot >> 3. Filer crahses again while replaying NVRAM - Reboot >> 4. Filer realizes it's failed replaying NVRAM twice in a row, so >> it flags it as bad, dumps the NVRAM, and - Reboot > > After replacing all the disks with new ones, the same crash popped >up again on that filer, after two more days of heavy reads and writes. >I rebooted it ten times in a row... kernel panic every time on >"../driver/disk/disk.c:2633: Assertion failure" right after the >"Recomputing parity in NVRAM" message (I assume that means it never >gets around to replaying the WAFL logs?). Did it ever mention dumping the NVRAM? Or is it always crashing before it ever replays NVRAM? I think I understand what you're saying above but you didn't mention any other error messages from the other reboots so I wanted to be sure. > Call Netapp tech support, was told an on-call engineer would call >back... haven't heard anything back yet. *sigh* It sounds like the disks themselves have become corrupted in such a way that it's dying on initialization... they will probably have you boot off floppy to wack the filesystem, and/or dump the NVRAM manually. The fact that it continues to be the same machine leads me to think that swapping out the NVRAM card and/or the SCSI cards may be in order. But, I'm no expert support person and I don't have access to the code, so Netapp may have already figured out what this bug is and the solution is totally different. :) Bruce

2 1

Alpha future
by Gregory M. Paris 08 Nov '97

08 Nov '97

Any thoughts on what the Intel/Digital settlement means to NetApp and its customers? It sure looks (to me anyway) as though the Alpha is dead. -- Gregory M. Paris Senior Systems Administrator Bose Corporation Email: paris(a)bose.com The Mountain, MS#258 Phone: +1 508 766-4072 Framingham, MA 01701-9168 FAX: +1 508 820-0173

9 10

Re: Possible software cause for total data loss?
by sirbruce＠ix.netcom.com 07 Nov '97

07 Nov '97

On 11/04/97 10:48:56 you wrote: >> During the pre-production testing of one of our F230 filers, we >discovered a problem with one of them that we wereonly able to fix by >rebuilding the RAID (and thus losing whatever OS and data was on the >filer). Be careful with precise words. "Rebuilding the RAID" is something that happens when a disk fails, and it doesn't cause you to lose your data. If you mean, say, re-initialize the filesystem, then that makes more sense. > Part of the tests consisted of filling up the filesystem via NFS >and NDMP copies from a host Ultrasparc. Three other F230's of >identical configuration survived the tests, but the remaining F230 >experienced the following panic four times: > >PANIC: ../common/wafl/nvlog.c: 1088: Assertion failure > > > I will be running the NVRAM diagnostics later today to see if they >turn up anything. However, more distressing is the behaviour of the >Netapp upon reboot: You are correct that it's probably a software bug, although running the NVRAM diagnostics (and perhaps re-seating the card) is certainly something you should try. >[... other boot messages deleted...] >Loading filesystem. >Recomputing parity in NVRAM > >PANIC: ../driver/disk/disk.c:2633: Assertion failure. > >version: NetApp Release 4.2a: Fri Sep 5 09:36:36 PDT 1997 >cc flags: 3 >dumping core: .......... Old core present on disk --- not dumped. >Program terminated >ok > > > At this point the filer is inaccessible, and I can't find a way to >get it up and running. Why was it "innaccessible"? Just reboot it again. >Is there a way to flush the NVRAM or ignore an >existing dump... some way to turn NFS back on so the data can be >retrieved. Yes... just keep rebooting and evetually it will throw away the NVRAM. As for "ignore an existing dump", no I don't think so, but that's okay... the fact that the filer can't dump core isn't preventing you from rebooting (although I think it does prevent the auto-reboot). >Booting the kernel off floppy doesn't help because it >tries to replay the WAFL logs too, and another panic occurs. When NVRAM is corrupt, you have to keep rebooting several times. The sequence is usually like this. 1. Filer crashes while running - Reboot 2. Filer crashes replaying NVRAM - Reboot 3. Filer crahses again while replaying NVRAM - Reboot 4. Filer realizes it's failed replaying NVRAM twice in a row, so it flags it as bad, dumps the NVRAM, and - Reboot 5. Filer comes back up, probably in degraded mode, and is thus reconstructing. If there has been filesystem damage it may crash here again, and reboot again. If you still can't get it up (it may say "Filesystem may be scrambled) or you can't get it up for any length of time, you should call Netapp support and have them help you with the procedure for fixing the filesystem (wack) from floppy. The real kickers in this are you have to "know" that it'll do 2 and 3, and won't just keep rebooting forever. I think I've seen cases where it takes more than that for it to jettison NVRAM, but I can't be positive. This made since from a design point of you, to only give up if you fail to replay the NVRAM twice in a row, but in reality it seems that with most bugs (not all) if it fails once, it'll fail again. Furthermore, once it decides NVRAM is corrupt, it tries to dumb core and reboot *AGAIN*. The design thought here was again a sound one - get a core dump so we can look at the corrupt NVRAM and figure out what's wrong. However, in reality, if you've gotten to this point you've probably already crashed, and dumped core once, so you'll never be able to see this faulty NVRAM core... at least not until Netapp starts supporting multiple cores. The other bad thing about this sequence is you have several crashes stemming from the original crash, and possibly even several different bugs, but you'll never be able to get the cores from anything but the first one. I think there is a way to bypass some of this by booting off floppy and jettisoning the NVRAM manually, but given the time involved you are probably better off just rebooting the filer again. >The only >way around I've found is to wipe out the filesystem and start over >again (obviously not the optimal solution). Ideas? The above should help. Bruce

2 2

netapp NFS server crash by FreeBSD client [w/patch] (fwd)
by Tom Yates 07 Nov '97

07 Nov '97

this just in from the bugtraq mailing list. i figure *some* of us probably aren't on that. i also don't expect thousands of people to be trying this exploit, but given that it's just gone out on bugtraq, if your toaster panics anytime soon, you might want to bear this in mind. needless to say, i haven't tried it. anyone want to have a go? Tom Yates - Unix Chap - The Mathworks, Inc. - +1 (508) 647 7561 MAG#65061 DoD#0135 AMA#461546 1024/CFDFDE39 0C E7 46 60 BB 96 87 05 04 BD FB F8 BB 20 C1 8C ---------- Forwarded message ---------- Date: Wed, 5 Nov 1997 21:34:00 -0800 From: "Dmitry Kohmanyuk [KOI8-R] �� " <dk(a)GENESYSLAB.COM> To: BUGTRAQ(a)NETSPACE.ORG Subject: netapp NFS server crash by FreeBSD client [w/patch] this is only relevant to those using NetApp NFS file servers. background: there are 2 versions of NFS in town, v2 and v3. In v3, one of things introduced was ability to read names of files in directory with stat(2)ing them at the same time; the procedure name is readdirplus. It can be used to speed up programs like ls(1). Apparently, NFS code in FreeBSD 2.2 (derived from 4.4BSD code, so perhaps this applies to all of modern BSD systems here) allow client to specify this in mount options without using NFS v3. This meaningless set of options panics NetApp file server. the following simple patch (attached) for /usr/src/sys/nfs/nfs_bio.c fixes this problem. --- nfs_bio.c.ok Wed Nov 5 20:11:17 1997 +++ nfs_bio.c Wed Nov 5 20:14:06 1997 @@ -1031,6 +1031,8 @@ case VDIR: nfsstats.readdir_bios++; uiop->uio_offset = ((u_quad_t)bp->b_lblkno) * NFS_DIRBLKSIZ; + if (!(nmp->nm_flag & NFSMNT_NFSV3)) + nmp->nm_flag &= ~NFSMNT_RDIRPLUS; /* dk(a)farm.org */ if (nmp->nm_flag & NFSMNT_RDIRPLUS) { error = nfs_readdirplusrpc(vp, uiop, cr); if (error == NFSERR_NOTSUPP)

3 2

archive
by Tom Yates 07 Nov '97

07 Nov '97

ok, i did some work. the list archive is better now, it automatically rebuilds every time someone sends a message to the list, so i'ts a lot more up-to-date than it had got any time recently. it moved a little, too... go to http://teaparty.mathworks.com:1999/toasters/ to find it. hope this is useful for people. Tom Yates - Unix Chap - The Mathworks, Inc. - +1 (508) 647 7561 MAG#65061 DoD#0135 AMA#461546 1024/CFDFDE39 0C E7 46 60 BB 96 87 05 04 BD FB F8 BB 20 C1 8C

1 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

toasters November 1997