toasters August 1999

toasters@lists.teaparty.net

104 participants
129 discussions

Sun Client Maximum Throughputs
by beepy＠netapp.com 05 Jan '00

05 Jan '00

Do any of our customers out there have some measurements of Sun client throughput? I thought I could use a 2 x 400MHz Ultra Enterprise E250 with a new Sun GbE card in it as a killer client, and I find that with V3 32KB UDP packets, the sucker rolls over at ~30 MB/s with 2 CPUs pinned at 99% in system time. This *sucks*. In an experimental filer setup reading from memory (all cached data) the F760 CPU is at 20%. I am totally client bound. Has anyone sized Sun clients out there? Are the PCI clients dogs? Is the new GbE card from Sun a loser? Should I go grab a new Alteon card for the Sun? What is your killer high performance client? It seems a function of the Sun E250 - 100BaseT seems to scale only to 30 MB/s before running out of CPU... Broken-hearted in Amsterdam, beepy

9 13

Wincenter Pro, excel 97 and netapp problem
by Krzysztof Miller 25 Nov '99

25 Nov '99

Hi, I am new on this list. Let me introduce myself - my name is Krzysztof Miller, I work for Lucent Technologies CIO dept, in Poland. We have F520 running 5.1.2 . Did anybody had problems with Wincenter Pro (or Cytrix Winframe) and NetApp ? We have following problem - when excel 97 session is open when you log off from wincenter, it deletes file, which is opened. NCD support suggest to upgrade system to 4.3.4D9, but I have 5.1.2 regards -- _____________________________________________________________ Krzysztof M. Miller, CIO Poland email : tytus(a)lucent.com phone : +48 52 3491845 fax : +48 52 3463033 mobile: +48 601 358032 _____________________________________________________________

3 2

RE: Remote Backup Options
by Melvin, Grant 22 Sep '99

22 Sep '99

Hi Benjy, Veritas added support for 3-way NDMP backups to NetBackup (NBU) 3.2. Data ONTAP has supported 3-way NDMP backups since we supported NDMP v2 in 4.x. HOWEVER, during certification of NBU 3.2 & 5.3.2 we recognized that 3-way restores weren't functioning completely in *every* scenario that we tested with NBU & we tracked this down to issues on *both* sides. These issues are being addressed, but won't be certified/supported until a follow-on release of Data ONTAP is released (currently scheduled for the end of October '99). Cheers, Grant -----Original Message----- From: Benjy Feen [mailto:benjy@feen.com] Sent: Monday, August 02, 1999 10:23 AM To: Steve Kappel Cc: toasters(a)mathworks.com Subject: Re: Remote Backup Options Steve: Since you seem to know both sides of the NDMP debate here, let me ask you: is there ANY NetBackup/NetApp configuration which allows 3-way NDMP backups? On Mon, 2 Aug 1999, Steve Kappel wrote: > >Speaking of remote backup, my backup team tells me that Veritas backups of > >Filers via NDMP don't work with the current release (5.3.x) of DOT. Any > >word on when (date please) this will be fixed? If already fixed, can > >someone let me know so I could get the appropriate version from Tech. > > OnTap 5.3.2 will be qualified for local (not 3-way) backup/restores > with NetBackup 3.2 very soon. 5.3.2 should be out "any day now". > > > __________________________________________________________________________ > Steve Kappel steve.kappel(a)veritas.com > VERITAS Software steve.kappel(a)iname.com (Personal) > Benjy Feen benjy(a)feen.com There is no spoon.

2 1

Re: tech support waits
by Skinner, Alistair 17 Sep '99

17 Sep '99

Hi Todd, My name is Alistair Skinner. I run the global support center at Network Appliance, and was passed your comments concerning the our email responsiveness and holdtimes. Thanks for taking the time to give us some feedback. I'll make sure we follow up with you on your specific support cases, but wanted to let you know what we're doing generally on web access and phone holdtimes. Our preferred method of online access is through the web. Using the email option from within a case view gives you a pre-formatted subject line to make sure the email is attached to the case in our system. The comments field can also be used. Unfortunately currently there can be up to a day delay between you submitting something and it getting into our system as these updates are batched. We expect to have an enhanced version implemented in the next two months which will include immediate updates directly to the case. Regarding phone responsiveness, due to our continued growth, we still experience some long holdtimes at certain times of the day. We have 10 new staff who will complete their training over the next few weeks, and we are in the middle of reallocating existing staff within the various specialty groups to better match resources to demand. Additionally this week we have changed our process behind the "the filer is not serving data" option - instead of queuing blindly for the next available support engineer, these calls are now routed to our customer support group who will connect you directly with a support engineer if your system is down. I also wanted to let you know as an additional resource, if you have concerns about how a case is being handled, you can call 1-888-4NETAPP, select the "new case" option, and ask to speak to the Duty Manager. Thanks again for your feedback, Alistair Alistair Skinner Network Appliance alistair.skinner(a)netapp.com 408-822-6339

4 3

NFS client tuning, problem and resolution.
by Nick Christenson 14 Sep '99

14 Sep '99

Recently, we encountered a serious problem in a high performance NFS environment using large Solaris servers as NFS clients. Due to the fantastic assistance of several people who have elected to not be named, we have identified the problem and a resolution for it. Since others on this list are likely to use Solaris servers as NFS clients in high performance environments, it was thought that this information might be generally useful. Situation and Symptoms: We were running on E class Sun's using Solaris 2.6 with what we thought were well considered kernel tuning parameters (cranked up nfs:nfs_max_threads, maxusers, rlim_fd_{max,curr}). The disks were using Sun's Gigabit Ethernet interfaces connecting to NetApp F760s using a high quality Gb Enet switch. When we turned the application on, we got stunningly low throughput numbers, and couldn't figure out why. Client CPU load was very good, most of the processing power was idle (and not in WAIT state), memory utilization was low, the filers were running at low utilization, there were no network or I/O bottlenecks in evidence. We were running two applications that do reads, writes, appends, creates, and deletes on relatively small files over NFS. One of these applications, and both have fairly similar characteristics and perform computations on the same file sets, was running just fine. The other was running well at light loads, but had horrible problems as the load went up. It appeared that some fixed resource was being consumed, and once we went over a certain load threshold, the number of processes grew exponentially while the amount of work being done remained constant. Eventually, these extra processes exhausted main memory and the machine began to thrash. Solution: Since we don't have access to Sun source, we can't be 100% certain of what was happening, but this is the best information we have. We think what we describe here is exactly what's going on, but there might be some minor variations. If someone here knows Sun internals, maybe they can fill in the gaps. Basically, it seems that we were having problems with the Directory Name Lookup Cache (DNLC) in Solaris. It seems that between Solaris 2.5.1 and Solaris 2.6 the math was changed on how this was calculated. Here's the math, as best I know it: 2.5.1 DNLC size = (max_nprocs + 16 + maxusers) + 64 2.6 DNLC size = 4 * (max_nprocs + maxusers) + 320 According to Cockroft's Sun Performance Tuning book, max_nprocs = 10 + 16 * maxusers We don't know if this calculation has changed between 2.5.1 and 2.6 Wanting large numbers of processes and large buffer sizes, we set maxusers=2048. This means that for Solaris 2.5.1, the DNLC size would be 34906 (or so) and for Solaris 2.6, the same maxusers variable would yield a size of 139624. Now, as we understand it, this is a LINEAR lookup table. Further, when deleting a file, an entry in this table must be looked up and locked. This was the distinction between our two processes, one does a single delete before completion, the other does three deletes. With a table this size, there seems to be a finite number of deletes/second one can perform over NFS, and we hit that limit. We put "set ncsize = 8192" in /etc/system, rebooted, and the problem went away. We played with sizes ranging from 4096 to 32768 and saw no huge performance difference, (8192 SEEMED best, 32768 SEEMED worst, but that was a VERY subjective evaluation), and we saw no significant difference to our DNLC cache hit rate as measured by "vmstat -s". Additional Information: One way you can look to see if you're experiencing this problem is to run "/usr/ucb/ps -uaxl" and look at the WCHAN column. To the best of our knowledge, Sun doesn't publish a translation guide to these event names, but we have it on good authority that if you see one called "nc_rele_" (which is a truncation of "nc_rele_lock") the process is waiting for DNLC entries to become unlocked. Note: On healthy machines processes will sometimes show up in this state, but if a significant percentage of them are in this state, that may indicate this problem. No, I can't accurately define "significant state". I doubt 1% is a problem, I know that 30% or higher is a problem. Also, Sun has a BugId for this problem, 4212925. As of August 2, they have a patch for 2.6, numbered 105720-08. It seems to do a lot of things, but the explanation of it isn't as revealing as we'd like, so we're a little leery of putting it into production without extensive testing. We're playing with it, but can't comment on it's efficacy at this time. It might hash the DNLC table (That would be the right solution, and word is this will happen in 2.8), but for all I know it may just revert to 2.5.1's DNLC math. Summary: If you're running Solaris 2.6 or 2.7 in a high performance environment where a lot of files are being deleted over NFS, make sure your DNLC is not too large or you'll have HUGE problems. Trust me. Hope this helps someone. -- Nick Christenson npc(a)sendmail.com

5 4

Re: Parts source for end of life products?
by sirbruce＠ix.netcom.com 31 Aug '99

31 Aug '99

On 08/31/99 14:10:34 you wrote: >On Tue, 31 Aug 1999 sirbruce(a)ix.netcom.com wrote: > >> >If the difference is so great NetApp should be proud to publish >> >their statistics without an NDA. >> >> If that were the case, every company should publish all of their internal >> practices. > >We are definitely, not on the same channel. I said nothing about their >practices. How they obtain the results was not a concern of mine. As >long as they publish the results they can make claims based on them. If >they don't publish such results any claims are just spin. No, publishing results *without* saying how those results were obtained is spin. By your suggestion, you have no problem if Auspex claims to get 10 million NFS ops @ 1.2ms. All they have to do is publish the data and don't have to say how they got it. But I suppose that's too general. We were talking specifically about publishing certain reliability statistics. To my knowledge, Netapp has done so, and claimed 99.99x% reliability. No, they didn't break it down by each component. Given how other vendors treat the same issue, I don't feel Netapp should produce more results to show that their memory in particular is more reliable. I also feel that doing so would only beg the question of how the results were obtained so someone else could try to reproduce them, and that's when you get into publishing internal practices. >> So long as Netapp catches at least one bad >> memory chip before it gets to the customer, my statement is strictly >> true, no matter any data to the contrary. > >No, any claims that NetApp memory is significantly superior to others is >not true in this case. One module out of thousands is hardly meaningfull >statistical data. I don't think I ever claimed 'significant'. I feel it is significant enough to mention, but you questioned that it was FUD and that it ever happened at all. I disproved that. As to exactly how many modules per thousand is enough to be 'significant', that's for you to decide. If you want the exact data, again, I suggest you ask Netapp, not me. I can only tell you what I know to be true. >> (Unless, I suppose, you can >> claim the testing somehow damages memory that otherwise wouldn't fail.) > >Ahh, you read my mind. It is unlikely that they would damage a lot of >memory during testing, but damaging one is certainly plausible. Such >mishap would nullify a benefit of one module described above. I don't think it's very plausible that the module could be damaged in testing in such a way that it doesn't fail at Netapp, but will fail at the customer site. There is another side to this... a customer who does not load their filers the 'right' way may be able to run fine with directly supplied memory that normally would not survive Netapp testing. However, I do not think this should be allowed to skew the data; the customer would still see failures under the right circumstances. I don't think Netapp should be faulted because they test their memory under conditions the customer's filer may never experience, to ensure it's reliability under those conditions. >> Perhaps you live in some magical candyland of perfect management. Or >> perhaps you are simply a beneficiary of this new age of low unemployment. >> However, people have indeed been "axed" for buying failing memory (and >> other parts) for the equipment vendor. > >That may be true, but I bet you there were a lot less (percentage wise) of >those cases than cases where someone purchased third party components all >other factors held the same. I bet you there aren't, because those "other factors" *aren't* the same, and the third party component really was less reliable. >> Different curve but the same principle. While most of the failures do >> follow the curve you describe, there are still the "out of spec" failures >> like the memory ones that happen strictly because of loading, not burn-in >> time. Many drives that fail in a Netapp can be used in your SCSI PC for >> years without any problems, because they don't talk to the drive in the >> same way. > >I don't disagree with you on this point in reference to drives. I never >did. I bet that drives are still the most likely component to fail in a >computer system. That I don't know. My point is simply distribution curve aside, memory has "out of spec" failures just like drives do, aside from "burn-in" failures, and Netapp's testing catches the former for memory that is not caught by Kingston, et al. Or at least, it *did* at one time. >> Ahh, I see. You have personaly axe to grind against Netapp, so you >> just want to toss in a snipe at every opportunity. Sorry, I thought >> you were interested in a serious discussion. > >No, I simply stated that one of the drives failed. No big deal, this >happens quite often with other manufacturers. And why didn't you just simply state all the drives that didn't fail? Don't play innocent; you weren't just making a casual remark. >> No duh. Guess what - memory in groups must also behave in a way as not to >> disturb the other memory (and other stuff going on on the motherboard). > >I don't think memory is as likely to influence other modules especially >that the modules today are of relatively high density which means that >you'll only have a couple of pieces of memory per system, number that >is significantly smaller than drives. I'm not sure; one could argue that although reduced in number, memory is more tightly coupled to other memory than a drive is to another drive. It all depends on exactly what the problem is you're talking about. But in either case it's irrelevant; more likely or less, the point is still true that these are issues for memory as well as disk, and if you think Netapp testing catches those issues for disk, you should accept Netapp testing catches those issues for memory. >Now, since this is leading nowhere, it's time to end the polemics. I agree. Bruce

2 1

Re: Parts source for end of life products?
by sirbruce＠ix.netcom.com 31 Aug '99

31 Aug '99

On 08/30/99 23:39:25 you wrote: > > >On Mon, 30 Aug 1999 sirbruce(a)ix.netcom.com wrote: > >> Netapp does additional testing and gets some memory >> to fail before selling it to you, the consumer. Statistically the >> memory you get from Netapp has to have a lower failure rate. > >As far as I remember memory failure follows Poisson distribution implying >that failures are just as likely to occur at any time. Since memory >failure rates are low and absolescence period is quite short memory >companies can guarantee memory for lifetime. Virtually every large memory >company carries a product with such warranty. This fact doesn't change the fact that some memory produced is not up to specifications. The memory may only "fail" under certain loading and timing constraints. For many manufacturers, this simply means it gets qualified and sold at a lower speed (high ns) much like CPUs. The point is the testing they do in-house is often not sufficiently thorough to duplicate the loading conditions of Netapp's requirements. I don't think Netapp's requirements are "out of spec" either; they are simply strict. >I would be interested in >the failure rate of "NetApp" memory in contrast to "normal" Kingston >memory. If the difference is so great NetApp should be proud to publish >their statistics without an NDA. I feel that NDAs, aside from pre-release >announcements, are generally a way for companies to hide their >shortcommings. If that were the case, every company should publish all of their internal practices. The fact is they don't. So long as that continues to be the business climate, Netapp would be foolish to put themselves at a disadvantage. The NDA is also there to prevent competitors from finding out how Netapp does things so well and then copying it. >General science is done by publishing and peer review not >through secrets. Netapp is not in the business of doing general science, they are in the business of making money. I, as an investor, am quite happy they limit their "science" to only those things that will help revenue generation. If they feel doing a report on their memory testing will do so, great. If they do not, great. I have confidence in their management. >> If the testing didn't do anything, why would Netapp bother? > >Claims of memory testing value and memory superiority over other brands >are greatly exaggerated by EVERY vendor. I've used inexpensive memory in >systems that have been up for ages and expensive memory that failed >miserably after several months in service. In the last year alone I >replaced *GIGABYTES* (Really, I am NOT exaggerating) of what was supposed >to be top quality memory certified by a large system vendor. "Our memory >is much better than someone else's memory" - mostly bunk! So you have had some bad experiences... I never claimed otherwise. But your particular bad luck (or perhaps, poor choice of vendors) does not disqualify my experience. So long as Netapp catches at least one bad memory chip before it gets to the customer, my statement is strictly true, no matter any data to the contrary. (Unless, I suppose, you can claim the testing somehow damages memory that otherwise wouldn't fail.) >The primary purpose of purchasing memory from equipment vendors is the >upkeep of warranties and service. It is also the cost of covering your >behind. Chances of getting axed because you bought failing memory from >the equipment vendor are next to nil. Perhaps you live in some magical candyland of perfect management. Or perhaps you are simply a beneficiary of this new age of low unemployment. However, people have indeed been "axed" for buying failing memory (and other parts) for the equipment vendor. >Chances of being axed if the memory >happens to be third party even with the same or lower failure rate as >OEM's are astronomically higher. I disagree. >> The same goes for the disk drives. > >Hard drives follow a different curve. They are very likely to die at the >very beginning and after a certain time in service. Burning in drives at >the beginning of their life greatly increases the odds that surviving >drives will have a low failure rate during the service period. The last >thing you want to do is send a customer a drive that will fail in the >first weeks of service. Different curve but the same principle. While most of the failures do follow the curve you describe, there are still the "out of spec" failures like the memory ones that happen strictly because of loading, not burn-in time. Many drives that fail in a Netapp can be used in your SCSI PC for years without any problems, because they don't talk to the drive in the same way. >I've had those too even from NetApp, but one can >always blame transportation even though the drives should withstand >several G's of shock and they're usually tucked into globs of foam. Ahh, I see. You have personaly axe to grind against Netapp, so you just want to toss in a snipe at every opportunity. Sorry, I thought you were interested in a serious discussion. >In addition, drives in arrays must behave in a way as not to disturb other >drives. No duh. Guess what - memory in groups must also behave in a way as not to disturb the other memory (and other stuff going on on the motherboard). >> Jeff Sloan has said they've now certified certain "direct from vendor" >> parts to be as good as Netapp supplied memory. My guess is either >> they have stepped down their internal testing, or the difference in >> failure rate has become too minimal to matter. > >Exactly! Regardless, I stand by my statement as having been true until whichever of the events above ocurred, which had to have been within the past year or two. >BTW, how many statisticians does NetApp employ to collect and >thoroughly analyze their data? Is there an audit of the results similar >to the scrutiny financial reports are given? Again, you could probably find out such information with an NDA, if they are willing to give it. There are certainly people in customer support responsible for tracking reliability data and breaking it down by components, filer, OS, etc. Bruce

2 1

Re: Does the netapp support etherchannel/trunking?
by tkaczma＠gryf.net 31 Aug '99

31 Aug '99

On Mon, 30 Aug 1999, Arthur Darren Dunham wrote: > NFS, CIFS, and HTTP requests are stored with the incoming interface. > Their response is sent via the same interface. > > That means that if the switch is round-robining the requests, the > responses will also be pretty well distributed. What happens when a request is larger than one ethernet frame? What happens if the response is larger than one ethernet frame. What happens if you send another request in while the first request is being output? I don't believe that the NetApp implementation rememebers the interface on which the request came in. I think that it simply remembers the last interface on which a packet came in from a certain host. A sort of "arp" table. > If the requests are bunched up, so will the responses. But the requests are a lot smaller than the responses so the one interface will be flooded while others will be sitting idle, right? > Is there any way to force a Cisco switch to round-robin traffic to a > netapp like this? The one time I played with this configuration, the > traffic was not well balanced at all. I thought you said that it was pretty well distributed. Tom

2 1

Parts source for end of life products?
by mitch＠netline.com 31 Aug '99

31 Aug '99

Greetings. I waded through the "toaster" archives hoping to get a list of vendors which sell parts for EOL'ed toasters but had no such luck. Specifically, I have an F210 which has 64Mb RAM and one shelf full of 4Gb hard disks. I'd like to upgrade it to the lastest OS release and get some more disk on it. Upgrading will require an additional 64Mb of RAM and for storage I either have to find a storage shelf and some more 4Gb disks, or replace the 4Gb disks I have with 9Gb drives. Since rack space is at a premium, I'd just as soon move to 9Gb disks but cost and availability are likely to be the deciding factors in the end. Suggestions? -- do svidaniya, ~mitch

2 1

Cluster Problems with second level virtual interfaces
by Stoltz 31 Aug '99

31 Aug '99

Hi folks, I've got two clustered F740s here with second level virtual interfaces configured for failover. The failover works fine on 5.3D20, but we've been unable as of yet to get it to work with anything 5.3.1 and beyond. We'd really like to move to 5.3.2D3 since it includes a fix to the ever-annoying NDMP file descriptor leak which makes it necessary to reboot our filers every week. I've tried moving forward to 5.3.2D3 several times for testing purposes, and it's yielded some very peculiar results, ranging from dropping both boxes off the network on takeover to what seems like some sort of persistence of nonexistent vif configs accross reboots. As I say, very peculiar. The netapp folks are equally puzzled. I'm really tired of hearing the phrases "that can't be" or "that makes no sense" (mostly from myself :P ). My question is, has anyone else gotten this configuration to work successfully in 5.3.2? Is anyone else using this configuration or something similar with trunked failover of interfaces in a cluster? Has anyone else been rebooting their boxes once a week for 3 months while they wait for a fix? # Setup trunk1. # ifconfig e0 mediatype 100tx-fd ifconfig e3a mediatype 100tx-fd vif create multi trunk1 e0 e3a # Setup trunk2. # ifconfig e3c mediatype 100tx-fd ifconfig e3d mediatype 100tx-fd vif create multi trunk2 e3c e3d # Setup second level trunk. # vif create single secondlev1 trunk1 trunk2 vif favor trunk1 # Configure for failover. # ifconfig secondlev1 partner secondlev2 # Bring up secondlev1. # ifconfig secondlev1 `hostname`-11 netmask 255.255.255.0 The partner is of course the same except for the names of the trunks and the names in the setup and configuration of the second level trunk. As I say, works fine in 5.3D20, broken as hell in 5.3.1 and beyond. Frustrated, Mark Stoltzfus Susquehanna Partners

1 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

toasters August 1999