toasters December 1999

toasters@lists.teaparty.net

100 participants
81 discussions

RE: Some fun things not to try at home
by Bulfer, David 22 Dec '99

22 Dec '99

Please be careful. I do not believe that your StorageWorks shelf has enough power for 9G drives. Buying a new supply from Compaq would not solve the problem either. Your results could be quite unhappy. David Bulfer Director of Platform Engineering Network Appliance, Inc. -----Original Message----- From: Priebe, Jason [mailto:priebe@wral-tv.com] Sent: Wednesday, December 22, 1999 1:22 PM To: toasters(a)mathworks.com Subject: RE: Some fun things not to try at home > -----Original Message----- > From: Bruce Sterling Woodcock [mailto:sirbruce@ix.netcom.com] > If you find out something > useful in your filer experience, plesae pass it along, even if it's of > the "I didn't read the manual, so I did this and it crashed" variety. While we're airing dirty laundry -- a couple of months ago, I mentioned to the list that we wanted to try to replace our F210's 4GB SCSI disks with 9GB disks (full shelf, no more shelves available from NetApp). I replaced two of the 4GB disks with 9GB ones, and all was well (of course, the filer only used 4 of the 9GB, but we figured we'd dump to tape, rebuild the file system, then restore once all the drives were in place). So I'm opening the 3rd disk carrier, removing the flex circuit from the disk drive (has anyone else ever tried to do this? How the #$%! do you do it?) by prying with a screwdriver, and the screwdriver goes right through the circuit. D'oh! I did find that I was able to call Compaq and order a new circuit (only $96 for that tiny little piece!). Still waiting on delivery, though. I'll let you know how it goes. Wish I'd known you could still order these parts -- I could have built a couple of my own carriers and had them ready to go into the filer instead of failing a drive, disassembling the carrier, reassembling the carrier, and reinstalling. Jason Priebe WRAL OnLine http://www.wral-tv.com/

1 0

Some fun things not to try at home
by Brian Rice 22 Dec '99

22 Dec '99

Here is a tale of woe, which you will find entertaining not only because it offers lotsa useful information about filer management, but also because it makes me look like an idiot. Several weeks ago, some co-workers of mine and I undertook to add a fibre-channel controller and shelf to each of two F630's, each of which previously had only one dual-SCSI adapter. We did the wrong thing, as it turned out (much later). The FC cards are PCI cards with a long edge connector. F630's have two kinds of PCI slots: some long, some short. We naively jumped to the conclusion that long cards go in a long slot, so we installed the FC cards in slot 10, rather than slot 7, which had been our plan before opening the filers. Slot 10, it turns out, was a bad idea. Disk controller cards on F630's are only supported in slots 5-8, inclusive. And, believe it or not, you are supposed to put the card in there in spite of the fact that it is too big...you just leave the extra section of connector hanging in mid-air. So we had just put the filers in an illegal configuration. You may be interested to know the failure mode: None. The filers booted right up, found all the disks, and announced them; we added them to a volume, and we were off to the races. A week later, the regular disk scrubs found bizillions of parity inconsistencies on each filer and fixed them. Then the filers started crashing ("PANIC: Freeing free block"). They would only restart after a run of wackz. Then the next disk scrub would put us on the path to ruin again. After several interactions with several levels of NetApp technical support, eventually all parties came to understand what had happened, and the fact that the FC cards are not supported in slot 10. So we got the FC cards put into the right slots, did a wackz, and breathed easier. Then one of the filers crashed again. The culprit there: wackz and disk scrubs look at two completely different pieces of the puzzle. I had the (once again) mistaken idea that wackz fixed a superset of the problems that a disk scrub fixes. Wrong. Wackz only looks at the file system structure and ignores the parity data; to fix the parity data, you need a disk scrub. Moral of the story: to make sure your disks have happy data on them, do both a wackz and a disk scrub. Yer not done until you do both. With that problem fixed, we breathed a sigh of relief. Then one of the filers crashed again. It turned out that that filer had been in the middle of doing its nightly backup when it crashed the first time, and so it had created a snapshot. Call it "fred". The system came back up, and eventually got fixed, wacked, and scrubbed, but fred was still corrupt. My backup script contained this: rsh toaster snap create fred It did not check to see whether fred exists already; if so, the snap create would just fail, and then the backup would proceed to dump the (existing) fred. Which, if fred is corrupt, would panic the system. I deleted fred. Then the filer was able to back itself up fine. So: make sure your snapshots are fresh. My backup script now starts like this: rsh toaster snap list | grep fred > /dev/null 2>&1 if [ $? -eq 0 ] then echo I refuse to work under these conditions exit 5 fi rsh toaster snap create fred Brian P.S. If you know about PCI, you may know that long PCI cards are 64-bit, and short ones are 32-bit. I've omitted that from the discussion above, since what leaps out at you first when you go to install a PCI card is not its bit-width but its physical width...the latter was what got us started down this awful path in the first place. P.P.S. NetApp Engineering is looking into building checks for illegal configurations into future versions of ONTAP.

7 11

virus detection
by Robert.Ryan＠chase.com 22 Dec '99

22 Dec '99

What are the best strategies for detecting and cleaning virus infected files on a filer. This would include not only file written to/from CIFS .... but also files which are written to shared volumes by UNIX and accessed by CIFS clients. Does anyone have experience with the performance impact of continual scans, and the amount of time required to scan a large volume. Thanks!

1 0

modifiying root volume
by Igor Schein 22 Dec '99

22 Dec '99

Hi, I have 1 root volume with 2 raid groups and 1 spare disk. Can I 1) make it 1 raid group and/or 2) free up another drive for a spare ( the current disk usage is minimal ) without destroying the volume and consequently having to reinstall OS? Thanks Igor

2 1

restore(1) behavior
by Igor Schein 21 Dec '99

21 Dec '99

Hi, I have a 740 filer and a tape library attached to it. When I do 'restore tvf' to see what's on the tape, all the disks start spinning intensively, as if I'm actually extracting from the tape rather then listing tape's contents. I'm not sure I understand why that's happening. Any idea? Thanks Igor

2 1

RE: Files disappear after some random interval
by Barrett,Eric 21 Dec '99

21 Dec '99

Your NFS client most likely has an entry in the crontab to delete all files in the /tmp directory (or all files older than a certain amount of time); this script is probably not smart enough to tell files on local storage from remote storage. Two things jumped out at me immediately: 1) You're mounting the filer on /tmp. 2) The deletions occur at 5:00am Saturday, which is prime cron time. Note also the ascending time of the directories -- as the file-removing script swept through the directories, it updated their modification times. The moral: NEVER, EVER, EVER NFS mount anything on or in /tmp. :) -- Eric Barrett / Technical Support Engineer, Network Appliance Direct: 1-408-822-4779 / Pager: 1-408-939-7945 Get answers NOW! - NetApp On the Web - http://now.netapp.com > -----Original Message----- > From: Dave Toal [mailto:dave_toal@t-t.com] > Sent: Monday, December 20, 1999 4:11 PM > To: jq(a)opensystems.com; toasters(a)mathworks.com; unixgroup(a)t-t.com; > support(a)opensystems.com > Subject: Files disappear after some random interval > > > Folks, > > We have a problem with files disappearing from a filer. > The box is a 520 running 5.2.1. THREE > times now, we've mounted one of its volumes to a production > box, done tar cvf - | ( cd mount_point; > tar xvf -), diff -r with no errors -- and then a day or two > later the files that were copied are > gone. Directories are there still, and symbolic links, but NO FILES. > > Snapshots were turned off after the second time it > happened; snap reserve is set to 0% and snap > sched is 0 0 0 for that volume. There are no remarks in > messages -- nothing besides [statd] time, > up x days. The directories show size but ls returns nothing. > > This is the volume on the netapp, mounted on the sun box: > > idsmajor:root:/mnt/etc>mount -p | grep /tmp/a > sanihome:/vol/nfs_archive - /tmp/a nfs - no rw > idsmajor:root:/mnt/etc>ls -l /tmp/a | head > total 81288 > drwxr-xr-x 2 ids users 344064 Dec 19 05:29 apr_1995 > drwxr-xr-x 2 ids users 393216 Dec 19 05:31 apr_1996 > drwxr-xr-x 2 ids users 3989504 Dec 19 05:51 apr_1997 > drwxr-xr-x 2 ids users 114688 Dec 19 05:52 aug_1994 > drwxr-xr-x 2 ids users 380928 Dec 19 05:53 aug_1995 > drwxr-xr-x 2 ids users 409600 Dec 19 05:55 aug_1996 > drwxr-xr-x 2 ids users 4227072 Dec 19 06:16 aug_1997 > drwxr-xr-x 2 ids users 282624 Dec 19 06:18 dec_1994 > drwxr-xr-x 2 ids users 290816 Dec 19 06:19 dec_1995 > idsmajor:root:/mnt/etc>ls -als /tmp/a/* | head -20 > 0 lrwxrwxrwx 1 ids users 29 Dec 7 16:34 > /tmp/a/dec_1997 -> > /archive/nfs_archive/dec_1997 > 0 lrwxrwxrwx 1 ids users 29 Dec 7 23:37 > /tmp/a/nov_1997 -> > /archive/nfs_archive/nov_1997 > 0 lrwxrwxrwx 1 ids users 29 Dec 8 01:05 > /tmp/a/oct_1997 -> > /archive/nfs_archive/oct_1997 > 0 lrwxrwxrwx 1 ids users 29 Dec 8 01:54 > /tmp/a/sep_1997 -> > /archive/nfs_archive/sep_1997 > > /tmp/a/apr_1995: > total 688 > 680 drwxr-xr-x 2 ids users 344064 Dec 19 05:29 . > 8 drwxr-xr-x 44 ids users 4096 Dec 20 13:33 .. > > /tmp/a/apr_1996: > total 784 > 776 drwxr-xr-x 2 ids users 393216 Dec 19 05:31 . > 8 drwxr-xr-x 44 ids users 4096 Dec 20 13:33 .. > > /tmp/a/apr_1997: > total 7808 > 7800 drwxr-xr-x 2 ids users 3989504 Dec 19 05:51 . > 8 drwxr-xr-x 44 ids users 4096 Dec 20 13:33 .. > > idsmajor:root:/mnt/etc> > > > There were no problems with applications pulling files > from these directories on the netapp on > Friday, Dec 18. This volume is not mounted anywhere else. > This is a production system -- nothing > writes to the mount point. > > What happened Saturday morning at 5 AM? Why did all > these directories get touched? > > > Dave > > > >

4 3

Files disappear after some random interval
by Dave Toal 21 Dec '99

21 Dec '99

Folks, We have a problem with files disappearing from a filer. The box is a 520 running 5.2.1. THREE times now, we've mounted one of its volumes to a production box, done tar cvf - | ( cd mount_point; tar xvf -), diff -r with no errors -- and then a day or two later the files that were copied are gone. Directories are there still, and symbolic links, but NO FILES. Snapshots were turned off after the second time it happened; snap reserve is set to 0% and snap sched is 0 0 0 for that volume. There are no remarks in messages -- nothing besides [statd] time, up x days. The directories show size but ls returns nothing. This is the volume on the netapp, mounted on the sun box: idsmajor:root:/mnt/etc>mount -p | grep /tmp/a sanihome:/vol/nfs_archive - /tmp/a nfs - no rw idsmajor:root:/mnt/etc>ls -l /tmp/a | head total 81288 drwxr-xr-x 2 ids users 344064 Dec 19 05:29 apr_1995 drwxr-xr-x 2 ids users 393216 Dec 19 05:31 apr_1996 drwxr-xr-x 2 ids users 3989504 Dec 19 05:51 apr_1997 drwxr-xr-x 2 ids users 114688 Dec 19 05:52 aug_1994 drwxr-xr-x 2 ids users 380928 Dec 19 05:53 aug_1995 drwxr-xr-x 2 ids users 409600 Dec 19 05:55 aug_1996 drwxr-xr-x 2 ids users 4227072 Dec 19 06:16 aug_1997 drwxr-xr-x 2 ids users 282624 Dec 19 06:18 dec_1994 drwxr-xr-x 2 ids users 290816 Dec 19 06:19 dec_1995 idsmajor:root:/mnt/etc>ls -als /tmp/a/* | head -20 0 lrwxrwxrwx 1 ids users 29 Dec 7 16:34 /tmp/a/dec_1997 -> /archive/nfs_archive/dec_1997 0 lrwxrwxrwx 1 ids users 29 Dec 7 23:37 /tmp/a/nov_1997 -> /archive/nfs_archive/nov_1997 0 lrwxrwxrwx 1 ids users 29 Dec 8 01:05 /tmp/a/oct_1997 -> /archive/nfs_archive/oct_1997 0 lrwxrwxrwx 1 ids users 29 Dec 8 01:54 /tmp/a/sep_1997 -> /archive/nfs_archive/sep_1997 /tmp/a/apr_1995: total 688 680 drwxr-xr-x 2 ids users 344064 Dec 19 05:29 . 8 drwxr-xr-x 44 ids users 4096 Dec 20 13:33 .. /tmp/a/apr_1996: total 784 776 drwxr-xr-x 2 ids users 393216 Dec 19 05:31 . 8 drwxr-xr-x 44 ids users 4096 Dec 20 13:33 .. /tmp/a/apr_1997: total 7808 7800 drwxr-xr-x 2 ids users 3989504 Dec 19 05:51 . 8 drwxr-xr-x 44 ids users 4096 Dec 20 13:33 .. idsmajor:root:/mnt/etc> There were no problems with applications pulling files from these directories on the netapp on Friday, Dec 18. This volume is not mounted anywhere else. This is a production system -- nothing writes to the mount point. What happened Saturday morning at 5 AM? Why did all these directories get touched? Dave

3 2

tcp stack reset, why; and automating a fix?
by Dave Toal 20 Dec '99

20 Dec '99

Folks, Hey again, Jay. I've got a question about a tcp stack failure on one of the clustered 740's, that happened over the weekend. Gary Andrade sez: > The box would respond to ping (inbound), but any outbound attempts > failed. The outbound TCP stack was toast. Skip and I conferred on > the situation and the box was rebooted (problem resolved). > > The fail over to Wesson did not occur due to the inbound ping working, > the other server sensed Smith as working. I'm wondering: is it possible to configure clustering (smith and wesson are a pair of 740's) to detect tcp failure? There are no cluster-level error messages on smith. messages.0 ... Sat Dec 18 23:52:41 GMT [smith: rc]: DNS server for domain "saegis.com" not responding : Connection timed out. Sun Dec 19 00:00:00 GMT [smith: statd]: 12:00am up 29 days, 21:18 138103986 NFS ops, 0 CIFS ops, 0 HTTP ops And this is the reboot: [appworx@:/smith/spool/etc]$ head messages Sun Dec 19 00:26:45 GMT [smith: rc]: System shut down with "reboot" command. Sun Dec 19 00:26:45 GMT [smith: cf_main]: Cluster monitor: takeover of partner disabled (local halt in progress) The web servers did show nfs failures while smith was having its problem: ... Dec 19 00:01:43 ww5 automountd[235]: server smith not responding Dec 19 00:01:43 ww5 last message repeated 6 times Dec 19 00:08:14 ww5 automountd[235]: server smith not responding Dec 19 00:08:15 ww5 last message repeated 6 times This server was rebooted as well, for the same reason -- stack failure. Can you suggest some possible reasons for the filer's tcp stack failure? We did have an incident last week where both web servers needed to be rebooted for the same reason -- tcp resets from stack overflow; the tentative theory at the moment is syn flood attack. But the netapp isn't exposed to the internet. If it's not possible to configure the filer to fail over on failed ping, then I'll script something. My second question: what do I need to sacrifice to the gods to get complete command-line rsh access? smith> ping ww5 ww5.saegis.com is alive smith> Connection closed by foreign host. [appworx@:/apps/appworxl]$ rsh smith 'sysconfig -r' | grep root Volume spool (root) [appworx@:/apps/appworx]$ rsh smith ping ww5 ping not found. Type '?' for a list of commands [appworx@:/apps/appworx]$ I could set up a test such that if nfs fails but the filer can still be pinged, then [reboot/fail over]. Feels like a kludge, though... not specific enough. I'd prefer to have the filers handle this on their own. Dave

2 1

Re: toasters and auto negotiation (off topic: Cabletron gripe)
by mds＠gbnet.net 18 Dec '99

18 Dec '99

On Fri 17 Dec, 1999, Bruce Walker <bmw(a)visgen.com> wrote: > Did you enjoy the cranky, confusing and decidedly non-orthogonal > telnet UI? That sad piece of work in my ESX1320 switch soured me Oh indeed, a joy. >-- End of excerpt from Bruce Walker -- -Mark ... an Englishman in London ...

1 0

RE: toasters and auto negotiation
by mds＠gbnet.net 18 Dec '99

18 Dec '99

On Fri 17 Dec, 1999, "Walsh, Warren (MN65)" <Walsh_Warren(a)htc.honeywell.com> wrote: > We recently converted our 330 and 540 from FDDI to 100BaseT. We found that > we had to manually force both the 100BaseT card in the toaster and the > specific port on our Cabletron Smart Switch 9000 to 100BaseT-FD. For the > 100BaseT card, it was simply changing the auto to 100tx-fd: > > ifconfig e5 `hostname`-e5 mediatype 100tx-fd netmask 255.255.248.0 > > Funny how that auto-negotiation thing never really works, and how much it > really affects performance. Funny, I was only talking with a friend today about how autonegotiation between Sun's and Ciscos works fine, and how Cabletron kit has such irritatingly non-functional autonegotiation in contrast. In my last job, running any number of Su's, SGI's and NetApp's using Cabletron switches the network guys and I always went hunting for the ports where the collisions were high and the autonegotiation had been left on, or turned back on.. As policy we decided to pin both ends to 100BaseTX-fd no-autonegotiation, because anything else was markedly less effective. This wasn't the only problem we had with Cabletron kit though - there were some horrors with the spanning tree reconfigurations, and pakcet fragmentation, and adjacent switches somehow bollixing sets of ports on other switches. Put me right off Cabletron as a networking equipment vendor I have to say. YMMV. I always quite liked FDDI: most of our filers were on FDDI and, perhaps because of the extra smarts in the cards, the performance was always pretty darned good. We even got our biggest machines their own Gigaswitch ports - though I never figured out if the cards (DEC cards in NetApp, DEC gigaswitch) went into the full-duplex mode that the DEC network engineers though they might be able to do when plugged straight into the Gigaswitch. Sure made cabling in the machine room easier using FDDI too. Until they put decent amounts of structured wiring in. > -warren > > Warren Walsh > Honeywell Technology Center >-- End of excerpt from "Walsh, Warren (MN65)" -- -Mark ... an Englishman in London ...

4 3

← Newer
1
2
3
4
5
6
7
8
9
Older →

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

toasters December 1999