toasters August 1997

toasters@lists.teaparty.net

21 participants
20 discussions

Re: More capacity on lower-end filers
by hitz＠netapp.com 10 Feb '98

10 Feb '98

> On Thu, 12 Jun 1997, Dave Hitz wrote: > > > > Although marketing certainly has input, the space restrictions also > > come from engineering droids. During normal operation the smaller > > machines could certainly handle more disk, but the time required for > > something like RAID reconstruction could get dangerously long. > > Is this the reason why the F210 can still only use around 50GB of > disk, even though it can physically accomodate more using 9GB drives? In order for a RAID reconstruction to complete, you have to read all of the data on all of the other disks. So RAID reconstruction time is proportional the the amount of data, not proportional to the number of disks. Dave Hitz hitz(a)netapp.com Network Appliance (408) 367-3106

8 10

"Interrupted system call" to F230 news spool
by Brian Tao 17 Sep '97

17 Sep '97

One of my inn 1.4unoff4 news reader servers started throttling itself just today with "Interrupted system call writing article file" (happened twice in the past 24 hours). The spool is on an F230 running 4.0.3, 256MB of read cache and 4MB of write cache. The news server is an Ultra 170, 512MB of RAM, ~250 to 300 readers around peak times. The two are on a FDDI ring. The F230 hovers around 65% CPU usage, so I don't think that's the problem, but the Ultra is reporting 900 to 1200 packets per second both in and out of its FDDI interface. Half of its time is spent in the kernel, according to top(1). The mounts are NFSv3 over UDP. Would dropping back down to NFSv2 help any? I'm trying to determine if this is a network congestion problem, or an OS limitation (on either the Netapp or the Sun). -- Brian Tao (BT300, taob(a)netcom.ca) "Though this be madness, yet there is method in't"

3 3

Re: restoring from snapshot
by sirbruce＠ix.netcom.com 26 Aug '97

26 Aug '97

On 08/26/97 15:35:53 you wrote: >Unfortunately there is no magic way, so you simply have >to use your favorite copy command. > >The "cp" command does scary things with symlinks and >device drivers, so "tar | tar" like you've done, or >"find | cpio -pdmv" are better. Yeah, tar has bugs in differnet versions... and cp is just bad. I prefer "find . -print | cpio -pdumv /path/to/newdir" myself. However, long filenames with spaces in it can break this too. :) Gotta write a fancy shell-script wrapper or something... Bruce

1 0

restoring from snapshot
by Christoph Doerbeck 26 Aug '97

26 Aug '97

Greetings, I've run into this scenario where a directory of "useful" stuff got deleted (about 7 gigs of "useful" stuff). At any rate, it all appears to be sitting in the hourly.1 snapshot as expected, but now I wonder... What is the best way to restore it? I`ve disabled the snapshots while my "tar cf - | tar xpf -" does it's thing, but is there a better way? I also wonder, since a "mv" blew away the directory in the first place (this is what I was told), can I "mv" the directory right back out of .snapshot? So what is the BEST snapshot restore method? ++----------------------------------+ )) Christoph Doerbeck (( Motorola ISG )) email: doerbeck(a)dma.isg.mot.com ++----------------------------------+

2 1

Very slow fcntl() calls from Solaris to F230
by Brian Tao 20 Aug '97

20 Aug '97

I noticed a very serious problem this morning on one of our news servers with a Netapp-based spool (Ultra 170, 1GB RAM, Solaris 2.5.1 + Aug 16 rec. patches). That server was recently upgraded from a 2.5 Ultra running INN 1.4unoff4 to the 2.5.1 Ultra running INN 1.5.1. Building or adding to the overviews files takes a *very* long time. I trussed the overchan as well as an expireover process, and I'm seeing a long delay (on the order of a minute or more) after one of the open() calls. I believe it is sleeping on fcntl(): [reading in article headers...] open("/news/spool/a/a/mis/talk/1944", O_RDONLY) = 4 read(4, " P a t h : t o r - n n".., 8192) = 1293 read(4, 0x0004D060, 8192) = 0 fstat(4, 0xEFFFF5A8) = 0 close(4) = 0 open("a/a/mis/talk/.LCK.overview", O_WRONLY|O_CREAT|O_TRUNC, 0664) = 4 open("a/a/mis/talk/.overview", O_RDWR|O_CREAT, 0664) = 5 [big delay here] fcntl(5, F_SETLK, 0xEFFFF5AC) = 0 fstat(5, 0xEFFFFAA0) = 0 writev(4, 0xEFFFFB28, 1) = 7404 rename("a/a/mis/talk/.LCK.overview", "a/a/mis/talk/.overview") = 0 close(4) = 0 close(5) = 0 open("/news/spool/a/bsu/programming", O_RDONLY|O_NDELAY) = 4 fcntl(4, F_SETFD, 0x00000001) = 0 [continue with next newsgroup...] It spends 99% of the time waiting for that fcntl() to return. The spool is mounted NFSv2, UDP. I've tried both hard and soft mounts. The same NFS configuration (AFAIK) worked fine on the old news server. I still have the old server online, and I can verify this (rm the overview file, then regenerate from scratch): old-server% time expireover -a -f /tmp/test.active 0.03u 0.11s 0:00.14 100.0% new-server% time expireover -a -f /tmp/test.active 0.04u 0.13s 1:17.25 0.2% Over a minute to create a 44-line .overview file? lockd and statd are running on the Solaris side, rpcinfo reports nlockmgr is registered. I must be missing something obvious, but I can't see it. :( -- Brian Tao (BT300, taob(a)netcom.ca) "Though this be madness, yet there is method in't"

1 1

filers and ClearCase
by matthew zeier 18 Aug '97

18 Aug '97

My group's looking to rollout a ClearCase implementation soon. We're a heavy NetApp house and most obviously want to use our filers in any implementation we come up with. However, I'm getting mixed stories as to the roll my filers can play in with ClearCase. On one hand I have my Sun sales guy telling me all about Sun's drive solutions and how the filer "simply won't cut it". He's convinced that ClearCase won't work on an NFS box all together. On the other hand I have confusion - PureAtria seems to be mixed if they can or can't (the instructor during the training class I was at had never heard of NetApp). NetApp, however, says their box does work and says that PureAtria uses filers inhouse too. If I've gathered my facts correctly, three out of the four ClearCase pools can sit on a filer, with the db part remaining on local disk. It's unclear to me how large the db can get, so I'm not sure how much local disk space I truely need. Does anyone have any metrics I can use? Local disk size per view, per vob? What type of setups have you deployed, using the filer and ClearCase? What class of hardware are you using for how many users? Any help you can give me is great. Thanks. - matthew -- matthew zeier -- mrz(a)3com.com -- 3Com EWD Engineering -- 408/764-8420 ...................................................................... "To live and die, it seems, is a waste without a dream." - BoDeans

2 1

Re: Disk space arithmetic
by mds46523＠ggr.co.uk 12 Aug '97

12 Aug '97

On Aug 12, 0:55, Jim Davis wrote: > Subject: Disk space arithmetic > I'm curious about accounting for the disk space on our filer, which has 19 > "4gb" data disks. > > Naively multiplying 19 disks * 4000 mb/disk * 1024 kb/mb gives 77824000 > kb. > > Now subtract 10% for the FFS-ish reserve space; that leaves 70041600 kb. > > df on the filer shows 69682640 kb for / and /.snapshot combined. That > leaves 358960 kb -- what did I overlook? Inode space? >-- End of excerpt from Jim Davis Disk manufacturers use metric megabytes... 19disks x 4 x 10 ^ 9Bytes/disk = 7.6 x10 ^ 10 = 76000000KBytes Which shaves some off the difference. Having said that, most disks are actually 4.x GBytes, so you'll probably see the full 4GBytes useable. I didn't think there was any FFSishness left in the WAFL - though WAFL metadata evidently requires a fair bit of space, especially if well used. Have you already deducted the Parity and Hot Spares space? 19disks - 1 Parity - 1 HotSpare = 17 disks for user-data. 17disk x 4 x 10 ^ 9Bytes/disk = 68000000KBytes. Which looks very close to your figure from df... -- -Mark, TSG Unix admin and support, int 782 4412, ext +44 1438 76 4412.

2 1

Re: Read errors on a spare disk?
by sirbruce＠ix.netcom.com 12 Aug '97

12 Aug '97

On 08/11/97 14:37:07 you wrote: > >On Mon, 11 Aug 1997, Kenneth Whittaker wrote: >> >> Disk 9a.3 can only mean trouble in the future. I recommend getting >> disk 9a.3 out of your system, but before you pull a drive, call >> technical support. > > That's odd... as I mentioned, the error appeared on two of our >filers, both on the hour immediately following a spare disk >replacement. Are you saying the new drives I plugged in both happen >to be bad too? I've already got two sitting on my desk to be >returned, and I sure hope I don't need two more going back. I don't understand Ken's response either. He should be well aware that there is a bug on this, where most of the time a disk will report a Unit Attention error on the hourly check after being swapped in... I filed the bug myself a year or two ago. Bruce

3 3

Backing up multiple Netapps
by Brian Tao 12 Aug '97

12 Aug '97

What do people do for enterprise-wide Netapp backups? It's nice to plug in a big tape drive into the dedicated SCSI port on each filer, but management becomes a hassle, and the hardware itself gets pretty expensive. If you centralize everything to a backup server and do dumps over the network, what do you use to manage the scheduling and auditing of the tapes? Is there a way to get Amanda to work with a bunch of Netapps? Anyone using NDMP-based tools? -- Brian Tao (BT300, taob(a)netcom.ca) "Though this be madness, yet there is method in't"

2 1

Read errors on a spare disk?
by Brian Tao 11 Aug '97

11 Aug '97

Why am I seeing problems with a disk that isn't even active? Does the Netapp do periodic media checks on the spare drives? I had just added this drive in a couple hours earlier that day, and I've only seen this one occurrence so far. Fri Aug 8 13:00:00 EDT [isp_main]: Disk 9a.3(0x004e6290): READ sector 0 unit attention (6 29, 0) Fri Aug 8 13:00:00 EDT [isp_main]: Disk 9a.3(0x004e6290): request succeeded after retry #1 RAID Disk DISK_ID# HA.SCSI# Used (MB/blks) Phys (MB/blks) --------- -------- -------- -------------- -------------- parity 5 9a.1 4000/8192000 4095/8388312 data 1 4 9a.2 4000/8192000 4095/8388312 data 2 0 9b.0 4000/8192000 4095/8388312 data 3 1 9b.1 4000/8192000 4095/8388312 data 4 2 9b.2 4000/8192000 4095/8388312 data 5 3 9b.3 4000/8192000 4095/8388312 data 6 8 9a.4 4000/8192000 4095/8388312 data 7 9 9a.0 4000/8192000 4095/8388312 data 8 6 9b.5 4000/8192000 4095/8388312 data 9 7 9b.4 4000/8192000 4095/8388312 spare 10 9a.3 0 4095/8388312 spare 11 9a.5 0 4095/8388312 -- Brian Tao (BT300, taob(a)netcom.ca) "Though this be madness, yet there is method in't"

2 3

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

toasters August 1997