toasters July 1997

toasters@lists.teaparty.net

29 participants
24 discussions

Re: More capacity on lower-end filers
by hitz＠netapp.com 10 Feb '98

10 Feb '98

> On Thu, 12 Jun 1997, Dave Hitz wrote: > > > > Although marketing certainly has input, the space restrictions also > > come from engineering droids. During normal operation the smaller > > machines could certainly handle more disk, but the time required for > > something like RAID reconstruction could get dangerously long. > > Is this the reason why the F210 can still only use around 50GB of > disk, even though it can physically accomodate more using 9GB drives? In order for a RAID reconstruction to complete, you have to read all of the data on all of the other disks. So RAID reconstruction time is proportional the the amount of data, not proportional to the number of disks. Dave Hitz hitz(a)netapp.com Network Appliance (408) 367-3106

8 10

"Interrupted system call" to F230 news spool
by Brian Tao 17 Sep '97

17 Sep '97

One of my inn 1.4unoff4 news reader servers started throttling itself just today with "Interrupted system call writing article file" (happened twice in the past 24 hours). The spool is on an F230 running 4.0.3, 256MB of read cache and 4MB of write cache. The news server is an Ultra 170, 512MB of RAM, ~250 to 300 readers around peak times. The two are on a FDDI ring. The F230 hovers around 65% CPU usage, so I don't think that's the problem, but the Ultra is reporting 900 to 1200 packets per second both in and out of its FDDI interface. Half of its time is spent in the kernel, according to top(1). The mounts are NFSv3 over UDP. Would dropping back down to NFSv2 help any? I'm trying to determine if this is a network congestion problem, or an OS limitation (on either the Netapp or the Sun). -- Brian Tao (BT300, taob(a)netcom.ca) "Though this be madness, yet there is method in't"

3 3

NetApp vs. FFP 7000 (was Re: Hello...)
by mer＠world.evansville.net 31 Jul '97

31 Jul '97

I'm following up on a thread from mid-June. You may recall that I asked for assistance deciding between a couple of small NetApps and a Falcon Systems FastFilePro 7000. I've include Andy Watson's response below for context. We made our decision to go with NetApp last week. Two key factors: - NetApp is the market leader and has strong momentum. With NetApp, I'm not concerned about owning an orphaned product, and I expect that the product line will continue to develop rapidly. NetApp's large customer base is a major asset because I feel comfortable that the toasters user community will be a good source of information as well as a point of leverage for ensuring that NetApp is responsive. - NetApp's entry-level price is very low compared to that of the FastFilePro 7000, and upgrades are easy and penalty-free. Our plans have changed a bit -- we're buying an F210 now for a big customer project and the two we originally wanted this fall. Thanks very much everyone for sharing your thoughts with me. -- Marc Rouleau VP and Chief Technology Officer Voice: (812) 479-1700 Fax: (812) 479-3439 World Connection Services, LLC http://www.evansville.net On Jun 13, 4:36pm, Andy Watson wrote: > Marc -- > > Given that no one else has offered you opinions (none that I've > seen on this distribution list, anyway), I thought I ought to > comment. I'll try not to exhibit too much of a NetApp vendor bias. > >> Apologies if this is off the charter -- let me know (gently, please!) >> and I'll desist -- but I'm evaluating dedicated NFS servers and have >> narrowed things down to Network Appliance (F220, qty 2) versus Falcon >> Systems (FastfilePro 7000, qty 1). I must admit that I'm leaning >> toward the 7000 due to its superior expandability and greater >> administrative flexibility (multiple filesystems and RAID sets, ability >> to run old drives via JBOD on the SCSI port). ... > > As I'm sure you are aware, NetApp has made design choices towards > simplification. The goal is not to offer "administrative flexibility" > but instead to provide a dedicated function appliance where admin > is streamlined to a minimal set of tasks. > > For example, unlike a mundane file server that requires the > administrator(s) to load balance among multiple file systems, > and reallocate storage between them, a NetApp filer has a single > file system that can be *software*-partitioned among subtrees > such that space allocation can be managed on-the-fly by simply > editing the tree quotas in /etc/quotas, and where no load-balancing > whatsoever need be attempted. You can export the single file system > as if it were multiple file systems by exporting subtrees with > various permissions and options specified in the /etc/exports file. > >> ... Also, the 7000 tests >> about even with the F540 in various magazine reviews, and NetApp's own >> LADDIS testing shows the F220 to be only half as fast as the F540 (is >> this true in the real world?). > > Well, in at least one magazine review I think it was UNIX REVIEW, > as I recall) a benchmark called "bigdisk" was used, which showed > Falcon's performance as comparable to a NetApp filer. Other than > that I don't recall other reviews where Falcon matched NetApp. > Anyway, in that article, bigdisk exercised the Falcon using NFSv2 > over UDP, because at that time Falcon's software did not support > NFSv3 nor NFS over TCP. NetApp was tested with NFSv3 over TCP. > Running NFS over TCP is known to hurt performance compared to UDP, > but TCP is provided because it is better for WANs and other "lossy" > networks where packet loss can cause heavy retrans overhead. > Anyway, we ran bigdisk internally, using NFSv2 over UDP, and > our performance was significantly better than what that magazine > reported for Falcon. > > On the broader topic of benchmarking, the industry-standard > benchmark of NFS file server performance is SFS (also known as > "LADDIS") from SPEC. SPEC has a peer review process whereby > official results are submitted, and if accepted by SPEC, can > be published in the SPEC Newsletter and on the SPEC website -- > (see http://www.specbench.org/osg/sfs93/results/results.html). > You will find Falcon and NetApp SFS benchmark results at that > site (and in the hardcopy Newsletter, if you have access to it). > NetApp has also published SFS results which have been accepted > by SPEC, but which have been inexplicably delayed in appearing > on the SPEC website. SPEC assures me that they will appear > very soon, probably by early next week. In the meanwhile, you > can see our latest results for our F210, F230, F520, and F630 > filers at http://www.netapp.com/technology/performance.html. > > Falcon has published four results via SPEC, but none of them > are annotated as being for a model "7000", so I'm not sure > what that machine's SFS benchmark results might be. Also, > in 3 out of those 4 tested configurations, Falcon used a > SSD (SolidState Disk) to significantly accelerate their > performance. If the proposed Falcon configuration you are > considering does not include that SSD, then you should ask > to have that added if you want to optimize that system's > performance. On the 4th Falcon configuration for which they > have published SFS results, instead of SSD, they combined > a BBU (Battery Backup Unit) with a Mylex disk controller's > RAM cache to accelerate Write performance. If you are > considering that approach, instead of using SSD, be sure not > to omit the BBU, because otherwise pending Writes (asynchronously > acknowledged to the clients) could be lost in the event of > a power interruption. (NetApp uses battery-backed NVRAM > ((Non-Volatile RAM)) to capture and preserve updates to the > file system, evenin the event of an interruption in power. > > The SPEC results will show you that our filers outperform Falcon's > servers in terms of average response time. This means we provide > faster replies to clients. Also, Falcon has only published results > for one configuration where parity RAID protection was enabled. > That happens to be for a model 9000 system with 62 disks, with > three RAM-caching Mylex controllers, and an SSD. Even so, two > F220s together will present more total ops/s (1213 ops/s each, > or 2426 together) than the large-config Falcon 9000 (2392 ops/s). > Each F220 needed only 14 disks to achieve this performance, > so even two of them together were exercising less than half the > spindles than the high-end Falcon. And with our latest software > release, and the newer F230 filer, the performance has improved > both in terms of faster response time (4.5ms for 1221 ops/s, > compared to the F220's 9.4 ms for 1231 ops/s) and greater max > throughput (1610 ops/s). > > But this is benchmarking numerology, to some extent. You are > wise to ask about realworld performance. We have consistently > found that NetApp filers deliver better performance for actual > application environments than indicated by the SFS benchmark > because the benchmark has a "worst-case" workload, compared > to most application requirements. Also, unlike other vendors > (including Falcon) where the SFS benchmark load is *perfectly* > load-balanced across many file systems, NetApp is benchmarking > with a single file system. Realworld applications depend on > the performance of the individual file system where the target > data resides. It is rare to find an individual application that > is actively exercising data stored in multiple file systems. > > I've written a paper which goes into greater detail on this > topic: "Interpreting SPEC SFS ('LADDIS') Benchmark Results" > (http://www.netapp.com/technology/level3/3010.html). It doesn't > specifically mention Falcon, but I think you will find it useful > in the analysis of your requirements and the comparison of NetApp > filer performance versus benchmark results for multiple-file system > configurations such as Falcon's. > >> On the other hand, NetApp tells me that the highly random nature of my >> traffic -- typical ISP work including email, USENET news, and webservers -- >> should point me toward multiple servers to maximize main memory cache hits. >> They say that the main memory cache on the 7000 will become worthless as I >> expand storage and that performance will drop off. I don't have a good >> feel for the importance of main memory cache here -- the 7000 uses hardware >> RAID controllers with onboard cache, so it seems like it would be less >> important to the Falcon product than it is to the NetApp line. But I must >> admit that NetApp's larger installed base -- 1000's versus 100's of >> systems -- makes it a safer choice on the surface. > > On the above topic, you should check out Karl Swartz's paper: > "The Brave Little Toaster Meets Usenet" (a USENIX LISA paper -- > see http://www.netapp.com/technology/level3/brave.html). > > In general, for large data sets and the highly randomized loads seen > by growing ISPs, it is futile to try to solve your I/O problems with > larger caches. Let's say you have 20 GB of data that is being accessed > by a large population of diverse users, such that there is relatively > little likelihood of data re-use from the cache. If even only 25% of > that 20-GB is exercised, then you would need 4 GB RAM, and you'd > merely fill it with data retrieved only to never need it again, > as the next 4-GB of different data flows through the cache over the > next equivalent time interval. > > So what you really need to focus on is disk subsystem performance. > And that's where we excel. Note that we have published all our > benchmark results with relatively small disk complements, giving > us the best per-disk (and per-file-system) performance results > in the industry. > > >Anyone care to share his/her thoughts on this topic? > > Well, I hope this helped. I tried to provide as much relevant > information as I could, with references, and to avoid spouting > unsubstantiated opinions. > > Good luck in your exploration of alternatives! > > -- Andy > > Andy Watson > Director, Technical Marketing watson(a)netapp.com > Network Appliance +1 408 367 3220 voice > 2770 San Tomas Expressway +1 408 367 3151 fax > Santa Clara, CA 95051 <http://www.netapp.com/> > > "It's really stupid to be an intellectual when you're young. > You should be an intellectual when you're a hundred years old > and can't feel anything anymore." > -- a character in Bruce Sterling's novel, HOLY FIRE > > > >

1 0

Net App and databases
by Tim & Patty Lewis 31 Jul '97

31 Jul '97

Is anyone out there using their NetAPP to house Oracle, Informix or Sybase databases? If so what have been the results? Any feedback on this issue would be greatly appreciated. Thanks Tim Lewis

3 2

problems mounting netapps on sgi's.
by s.a.johansen＠kjemi.uio.no 25 Jul '97

25 Jul '97

I have a 220 with 4.0.3 and 37 quoted trees. These mount fine on all our machines (solaris, aix, linux, ultrix, digital unix), except on sgi's with IRIX 6.2 or higher, regardless of model. On 5.2 or 5.3, there is no problem at all. Upon doing 'mount -a' i get the following errors: mount: chatelier:/mn/chatelier/u37 already mounted mount: chatelier:/mn/chatelier/u36 already mounted mount: chatelier:/mn/chatelier/u35 already mounted mount: chatelier:/mn/chatelier/u20 already mounted chatelier:/mn/chatelier/u34 mounted on /mn/chatelier/u34 chatelier:/mn/chatelier/u33 mounted on /mn/chatelier/u33 chatelier:/mn/chatelier/u32 mounted on /mn/chatelier/u32 mount: chatelier:/mn/chatelier/u30 server not responding: Timed out mount: chatelier:/mn/chatelier/u31 server not responding: Timed out mount: giving up on: /mn/chatelier/u31 mount: giving up on: /mn/chatelier/u30 mount: chatelier:/mn/chatelier/u29 server not responding: Timed out mount: chatelier:/mn/chatelier/u28 server not responding: Timed out mount: giving up on: /mn/chatelier/u29 mount: giving up on: /mn/chatelier/u28 mount: chatelier:/mn/chatelier/u27 server not responding: Timed out mount: giving up on: /mn/chatelier/u27 mount: chatelier:/mn/chatelier/u26 server not responding: Timed out mount: giving up on: /mn/chatelier/u26 mount: chatelier:/mn/chatelier/u25 server not responding: Timed out mount: giving up on: /mn/chatelier/u25 mount: chatelier:/mn/chatelier/u24 server not responding: Timed out mount: giving up on: /mn/chatelier/u24 mount: chatelier:/mn/chatelier/u23 server not responding: Timed out mount: giving up on: /mn/chatelier/u23 mount: chatelier:/mn/chatelier/u22 server not responding: Timed out mount: giving up on: /mn/chatelier/u22 mount: chatelier:/mn/chatelier/u21 server not responding: Timed out mount: giving up on: /mn/chatelier/u21 mount: chatelier:/mn/chatelier/u19 server not responding: Timed out mount: giving up on: /mn/chatelier/u19 As you can see, the sgi will mount some trees and time out on the rest. Each time I mount -av some more trees will mount, and the rest will time out. Has anyone else seen this? -- Ståle Johansen, University of Oslo.

1 0

Snapshot bug?
by David Drum 22 Jul '97

22 Jul '97

Hello everyone, I am quite perplexed by the behavior I am seeing below: david@vortex ~ $ find * -type f -print | sed -e 's%.*%ls -ldtr "&" .snapshot/*/"&"; echo%' | bash > messed.up.files .snapshot/*/Mail/.dir.tiff: No such file or directory .snapshot/*/NeXT/Apps/.dir.tiff: No such file or directory .snapshot/*/NeXT/Apps/.opendir.tiff: No such file or directory .snapshot/*/NeXT/Apps/Icon.app/brushes/.places: No such file or directory .snapshot/*/NeXT/Apps/Icon.app/help/.places: No such file or directory .snapshot/*/NeXT/Apps/Icon.app/help/.list: No such file or directory ^C david@vortex ~ $ ls -l .snapshot/*/NeXT/Apps/Icon.app/help/.list -rw-r--r-- 1 david staff 1095 Nov 14 1992 .snapshot/hourly.0/NeXT/Apps/Icon.app/help/.list -rw-r--r-- 1 david staff 1095 Nov 14 1992 .snapshot/hourly.1/NeXT/Apps/Icon.app/help/.list -rw-r--r-- 1 david staff 1095 Nov 14 1992 .snapshot/hourly.2/NeXT/Apps/Icon.app/help/.list -rw-r--r-- 1 david staff 1095 Nov 14 1992 .snapshot/hourly.3/NeXT/Apps/Icon.app/help/.list -rw-r--r-- 1 david staff 1095 Nov 14 1992 .snapshot/weekly.0/NeXT/Apps/Icon.app/help/.list Would anyone care to comment? This is greatly disturbing. Regards, David K. Drum david(a)more.net -- "That man has a rare gift for obfuscation." -- ST:DS9 * "It's hard to be bored when you're as stupid as a line." -- Vernor Vinge * "Reality has a tendency to be so uncomfortably real." -- Neil Peart * "You can only measure the size of your head from the inside." -- Larry Wall

2 5

Macs and toasters
by Ketil Kirkerud 10 Jul '97

10 Jul '97

Among our netapps we have a F330 which will soon be put into use as a fileserver for our internal use. We have PCs, various UNIX workstations, and a few Macs. The Macs are giving me a headache - the netapp doesn't support Appletalk, so we are pondering the alternatives : We've tried 'DAVE' from Thursby systems, which gives the Mac CIFS-client capabilities. This works fine with NT servers, however it does not work with netapps. (This is noted in bug #4140 - with no indication of whose problem it is, or, for that matter, when it might be fixed). Other options include NFS client software on the Macs, or setting up a gateway, which will mount the netapp via nfs, and re-export the filesystems via Appletalk (using linux and the free appletalk package, for example). Has anyone tried any of these solutions ? What will work ? regards, -- ---Ketil Kirkerud, Scandinavia Online I synergistically leverage my paradigms

1 0

Re: upgrading from 3.1c to 4.0.1c
by Amy Chused 09 Jul '97

09 Jul '97

At 05:16 PM 7/7/97 -0400, Christoph Doerbeck wrote: >I'm looking for input regarding why I might (or might not) encourage >my users to allow me to upgrade the filer OS from 3.1c to 4.0.1c. >What features are we missing? What pitfalls can I expect? I just >want to cover my bases... > >What we have currently is very stable. As far as I am concerned, there is one major advantage to the 4.x series. It can serve files over CIFS. It can also serve files over http, but that's much less important to me. We have a large and growing number of PCs, and installing NFS software on them is a pain in the rear. Being able to serve files over CIFS is very convenient. If you don't need CIFS or HTTP file service, then it's a much closer call. 'course, 4.x is their current release, and any performance tweaks they make are gonna go into 4.x, which will probably make it a bit faster, and potentially a bit less stable. If you're performance hungry, go with 4.x. If you absolutely have to stay up, ask them for their current list of known bugs in 3.1c and 4.0.1c, read them carefully, and decide which set you're less likely to trigger. They're pretty good with their known bugs list. Amy

2 1

Marking failed drives across boots?
by Steve Gremban 08 Jul '97

08 Jul '97

Had something similar happen to us. One of our Netapps failed a drive and reconstructed its data onto a different drive (from the hot spare pool). The failed drive should have been replaced asap but it fell through the cracks and was forgotten. A week later we shut it down to replace an fddi card and afterwards it wouldn't boot up. The failed drive was apparently working well enough so that the Netapp thought it had a RAID drive that wasn't a valid member of the array (inconsistent disk labels). Once we removed the problem drive the Netapp booted just fine. I don't understand why your bad drive was added to the hot spare pool upon reboot. It should have had a label that was inconsistent with the other drives and the Netapp shouldn't have booted. >> Could the Netapp somehow mark a bad drive so that the information >> is kept across boots? If a failed drive is working after a reboot then its label should be inconsistent with the other drives in the array and the Netapp shouldn't boot. NOTE: For those wondering what label I am talking about here is an excerpt from the System Administrator's Guide chapter on Troubleshooting: The Netapp writes a label on each disk indicating its position in the RAID disk array. regards, Steve Gremban gremban(a)ti.com PS: I noticed that you are running without a hot spare configured. We normally configure a spare in order to minimize time spent at risk in degraded mode. Do you consider the risk is minimal that another drive will fail before the first one is replaced and rebuilt or do you have some plan in place to ensure that someone is notified immediately? What about early in the morning, on weekends, holidays? Anyone else out there running without hot spares? ------------- Begin Forwarded Message ------------- >From madhatta(a)turing.mathworks.com Mon Jul 7 15:40:34 1997 Date: Mon, 7 Jul 1997 16:34:26 -0400 (EDT) From: Brian Tao <taob(a)nbc.netcom.ca> To: toasters(a)mathworks.com Subject: Marking failed drives across boots? MIME-Version: 1.0 We had a problem with our Netapp this morning that could potentially be quite serious. One drive near the beginning of the chain (ID 2, I believe) was failed out by the Netapp. Very shortly thereafter, the filer crashed with a RAID panic and rebooted. Upon rebooting, it noticed that drive ID 2 was not actively being used, and proceeded to add it to the hot spare pool. Then it began reconstructing the data on to (you guessed it) drive ID 2. In this scenario, there was no time to pull out the bad drive, and the Netapp happily rebuilt the data on it. I guess the correct procedure now is to forcibly fail that drive and rebuild to our good spare drive, and remove drive ID 2. Could the Netapp somehow mark a bad drive so that the information is kept across boots? -- Brian Tao (BT300, taob(a)netcom.ca) "Though this be madness, yet there is method in't" ------------- End Forwarded Message -------------

2 1

Re: Marking failed drives across boots?
by Amy Chused 08 Jul '97

08 Jul '97

> Anyone else out there running without hot spares? Now, no. In the past, yes. We were space hungry and we'd maxed on the number of disks in some of our earlier and smaller NetApps, so we threw the spares into the pool because we kept tipping over the 100% full mark and not being able to write anything and that was bad. As it turned out, we were okay. The odds of two disks going bad before you get a chance to replace one of them is fairly minimal, especially if you set the thing to send you e-mail notifying you of a bad drive. Especially if you set the thing to send e-mail to both your mailbox and your pager notifying you of a bad drive. Amy -- who now has enough co-workers to not have to carry a pager 24x7x365 anymore, but who does it anyway.

1 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

toasters July 1997