toasters July 2000

toasters@lists.teaparty.net

111 participants
130 discussions

Netapp filesystem failure rate
by Brian Tao 30 Jul '00

30 Jul '00

Does Netapp have any numbers that they could share publically on how often they see filesystem failures in the field (i.e., double disk failure, spare drive bug, etc.) that would require restoring all the data from tape? I did a quick count in my head and figured we have roughly 21 filer-years of operation (1 filer running for 3 years, 2 for 2.5 years, etc.) without a catastrophic failure yet. Is the actual observed number more like 50 filer-years? 100? 200? -- Brian Tao (BT300, taob(a)risc.org) "Though this be madness, yet there is method in't"

1 0

Remove me from this alias please
by meadsoft＠infonet.ca 30 Jul '00

30 Jul '00

Remove me from this alias please.

1 0

RE: Performance probs with UDP NFSv3. (and other pertinent questions)
by mds＠gbnet.net 28 Jul '00

28 Jul '00

I'd put money on the client-side options in /etc/system (or ndd equivs in scripts) not being raised high enough. We've discussed these Solaris tunables to death before now. A good document on them can be found here: http://www.rvs.uni-hannover.de/people/voeckler/tune/EN/tune.html I also have a question stemming from this discussion; first, some background: Oracle, and other db's I presume, keep a lot of data in memory. They arrange to have parts of tables in memory according to their use patterns, if at all possible. Solaris, as with most modern Unixen, has a filesystem buffer-cache that keeps small writes (and probably also large writes) and reads in memory that's not part of the user-process space, and isn't AFAIK shared with the user-process space. It's possible in Sol2.6-7 to use priority-paging to alleviate the buffer- cache competition for process memory. Solaris 8 has a different memory model. Either way the problem doesn't go away, it just bites less hard and fast. Now (I'm getting there!) VxFS, UFS (since Sol2.6), and raw-disk devices (or Disksuite metadevices or VxVM volumes) all allow for DirectIO or raw IO. The goal being *not* to double-cache database files. The DBMS is better at cacheing database data and more system RAM can be used constructively by the DBMS if there's less buffer-cache competing. It also saves lots of double-copies between the buffer-cache and user process space. As an aside VxFS allows 'discovered' DirectIO which means Oracle's small writes can still cause a problem. What I want to know, and what I have yet to have clarified for me, is: 1) are NFS files buffer-cached? (I believe they are) and, 2) is there anything that can be done to moderate the buffer-cache competition (other than priority-paging), and/or alleviate the double-copying that this implies. Also, given that on Solaris I'm informed that nothing short of an obscure fcntl() call can ensure data is flushed from a file back to the server, just how do databases maintain their ACID transaction semantics? Do you need log sections on local disk regardless? I feel I should know this stuff, being an old hand now on the list, but it occurred to me that I still don't have all the answers. Sorry if this was a little incoherent, not entirely sober and alert right now, I might have missed something obvious. -- -Mark ... an Englishman in London ...

2 1

RE: Performance probs with UDP NFSv3.
by jeff.mohler＠netapp.com 28 Jul '00

28 Jul '00

If I had to GUESS here..I would say that the GigE to FastE conversion -might- be the issue with large transmit windows (v3). UDP has no throttle methodology built in like TCP does...and you could be overrunning the conversion process. More stats would help..but im only guessing. The Gig to FastE issue isnt perhaps the best way to do this at all, but will work given you dont overrun the switching mechanism. Now if this were a <xxx> to FDDI, I would be concerned, because the packet fragmentation and reconstruction will easily override most switches. -----Original Message----- From: Krishnan Prabhakar [mailto:kpkar@netapp.com] Sent: Friday, July 28, 2000 12:55 PM To: foo+netapp(a)eek.org Cc: toasters(a)mathworks.com Subject: Re: Performance probs with UDP NFSv3. > The network at the moment is like this: > > [760] flow-control need to be turned ON > | > Gig-e > | > | > [Foundry FastIron] flow-control need to be turned ON on the port > | > | > 100Mb/s Ethernet > | > | > [Foundry FastIron] flow-control need to be turned ON > | > | > Gig-e > | flow-control need to be turned ON > [Sun e4500] i guess the bottle-neck is the 100bt inbetween 10x faster Gig-e links on both-sides. ( FastIron cant forward the packets through 100Mb/s port as fast as it receives packets through 1000Mb/s port ie. from filer/e4500) Usually its done the otherway ( 100bt -- 1000sx pipe - 100bt ) the above network topology is not 'appropriate' for a server network! i guess there should be lots of 'dropped'/retransmits at the both ends ( 'netstat -s' ) Note : this is my first guess based on the info. in this thread. > > > The reason for the 100Mb in between is that for the moment the 4500 and > the filer are in different parts of a building (power issues prevent > having them in the same place) which is connected via 100Mb. > > That said, I spent a lot of time looking at the switches and network to > see if it was causing the problems. The fast ethernet interface has never > gone above around 30Mb or so, there are no errors of any kind of any of > the ints (no collisions, no align/fcs/giant/short, nothing *at all*). > Everything is full-duplex, flowcontrol is off on everything (any thoughts > about that?), all of the cabling was tested before installation... in > short, the network looks right. > > Eventually the filer and DB will be connected directly to each other, but > I have trouble believing that this will solve the current problems > considering there is currently no congestion in the network. > > The DB has all of the recent patches, but no optimization has been done. > > What am I missing? > > -Brian > > foo wrote: > > Has anyone else experienced performance problems using UDP NFSv3 between > > Solaris 2.7 and an F760 (running 5.3.5R2)? > > > > Using v3 I get the following errors intermittently: > > > > Jul 27 20:41:07 bullwinkle unix: NFS server 10.20.10.10 not responding still trying > > Jul 27 20:41:07 bullwinkle unix: NFS server 10.20.10.10 ok > > Jul 27 20:42:06 bullwinkle unix: NFS server 10.20.10.10 not responding still trying > > Jul 27 20:42:06 bullwinkle unix: NFS server 10.20.10.10 ok > > Jul 27 20:43:30 bullwinkle unix: NFS server 10.20.10.10 not responding still trying > > Jul 27 20:43:30 bullwinkle unix: NFS server 10.20.10.10 ok > > Jul 27 20:45:40 bullwinkle unix: NFS server 10.20.10.10 not responding still trying > > Jul 27 20:45:40 bullwinkle unix: NFS server 10.20.10.10 ok > > > > When I switched the mount to v2 (udp), these errors went away, and > > performance seemed to increase. Netapp documentation recommends v3, but it > > doesnt seem to work very well in my environment. > > > > Since this filer is used for a DB, this is causing huge problems. Should I > > just leave it at v2, or is worth trying to determine why v3 is performing > > poorly? > > > > -Brian

1 0

Performance probs with UDP NFSv3.
by foo 28 Jul '00

28 Jul '00

Has anyone else experienced performance problems using UDP NFSv3 between Solaris 2.7 and an F760 (running 5.3.5R2)? Using v3 I get the following errors intermittently: Jul 27 20:41:07 bullwinkle unix: NFS server 10.20.10.10 not responding still trying Jul 27 20:41:07 bullwinkle unix: NFS server 10.20.10.10 ok Jul 27 20:42:06 bullwinkle unix: NFS server 10.20.10.10 not responding still trying Jul 27 20:42:06 bullwinkle unix: NFS server 10.20.10.10 ok Jul 27 20:43:30 bullwinkle unix: NFS server 10.20.10.10 not responding still trying Jul 27 20:43:30 bullwinkle unix: NFS server 10.20.10.10 ok Jul 27 20:45:40 bullwinkle unix: NFS server 10.20.10.10 not responding still trying Jul 27 20:45:40 bullwinkle unix: NFS server 10.20.10.10 ok When I switched the mount to v2 (udp), these errors went away, and performance seemed to increase. Netapp documentation recommends v3, but it doesnt seem to work very well in my environment. Since this filer is used for a DB, this is causing huge problems. Should I just leave it at v2, or is worth trying to determine why v3 is performing poorly? -Brian

4 3

Re: CIFS, ownership and chowning from nfs -Reply
by Timothy Cook 28 Jul '00

28 Jul '00

setting anon=0 in your exports will set any unknown uid to roots uid, allowing unscrupulus people to create scripts which will execute as root and wreak havoc on your system. Not a recommended procedure as far as system security is concerned! >>> Jay Tribick <jay.tribick(a)carrier1.net> 07/27/00 08:50am >>> > After on of the NT/Admins went bonkers last night moving a bunch of > users to the NetApp (without consulting me first), I have found that > appox 10% of the user catalogs is owned by root, with unix-style > permissions are drwxrwxrwx, and I am unable to change owner or > permissions from either nfs-side or using SecureShare tool on the NT. > > The users are unable to write to their nfs-mounted files > > At the time of migration, the usermap.cfg did not contain mapping of > the Admin account/group to root, and > > wafl.nt_admin_priv_map_to_root on > wafl.root_only_chown on > > Does anyone have a clue to what can remedy this? Try adding anon=0 onto your options in /etc/exports - that should allow you to change permissions as long as your UID is 0 (AFAIR) -- Regards, Jay Tribick Senior Systems Engineer Carrier1 Voice: +44 207 531 3874 Mobile: +44 7801 526 638

4 4

RE: Performance probs with UDP NFSv3.
by jeff.mohler＠netapp.com 28 Jul '00

28 Jul '00

Can you describe what network hardware lies between the filer and your UNIX machine? Im more inclined to say you have a transmission problem than a version one. Smaller transmission windows in v2 may be hiding a performance issue in your LAN/WAN that v3 has a problem with. End resuly may be going to TCP to introduce more reliable error correction. -----Original Message----- From: foo [mailto:foo+netapp@eek.org] Sent: Thursday, July 27, 2000 9:18 PM To: toasters(a)mathworks.com Subject: Performance probs with UDP NFSv3. Has anyone else experienced performance problems using UDP NFSv3 between Solaris 2.7 and an F760 (running 5.3.5R2)? Using v3 I get the following errors intermittently: Jul 27 20:41:07 bullwinkle unix: NFS server 10.20.10.10 not responding still trying Jul 27 20:41:07 bullwinkle unix: NFS server 10.20.10.10 ok Jul 27 20:42:06 bullwinkle unix: NFS server 10.20.10.10 not responding still trying Jul 27 20:42:06 bullwinkle unix: NFS server 10.20.10.10 ok Jul 27 20:43:30 bullwinkle unix: NFS server 10.20.10.10 not responding still trying Jul 27 20:43:30 bullwinkle unix: NFS server 10.20.10.10 ok Jul 27 20:45:40 bullwinkle unix: NFS server 10.20.10.10 not responding still trying Jul 27 20:45:40 bullwinkle unix: NFS server 10.20.10.10 ok When I switched the mount to v2 (udp), these errors went away, and performance seemed to increase. Netapp documentation recommends v3, but it doesnt seem to work very well in my environment. Since this filer is used for a DB, this is causing huge problems. Should I just leave it at v2, or is worth trying to determine why v3 is performing poorly? -Brian

3 3

Need terminator for one of the disk shelfs.
by Robert Johannes 28 Jul '00

28 Jul '00

Dear fellas, I was recently involved in the moving of a netapp filer from a hosting facility to our office building, and during transportation, I lost the terminator that plugs into the port that looks like a serial-port on the back of the disk shelf. That port is labeled "output/terminator" on the back of disk shelf (I assume the casing actually housing the disks is called the disk shelf?) Where can I find a replacement for this terminator? Would a standard retail store carry something like this? If so, which ones? Any help appreciated. I've not tried running the filer without that terminator, does anyone know what would happen if I did? Do I stand to lose data? Thanks Robert Johannes Systems Administrator Kairos-Damango Internet Services Inc. 1300 Godward Street suite 3200 Minneapolis, MN 55413

4 3

RE: Performance probs with UDP NFSv3.
by Aaron.Sims＠netapp.com 28 Jul '00

28 Jul '00

You mention this filer is used for a DB. I'll assume Oracle. You really should have a direct crossover connection between the filer and the Sun, preferably gigabit ethernet. Aaron > -----Original Message----- > From: foo [mailto:foo+netapp@eek.org] > Sent: Friday, July 28, 2000 12:18 AM > To: toasters(a)mathworks.com > Subject: Performance probs with UDP NFSv3. > > > Has anyone else experienced performance problems using UDP > NFSv3 between > Solaris 2.7 and an F760 (running 5.3.5R2)? > > Using v3 I get the following errors intermittently: > > Jul 27 20:41:07 bullwinkle unix: NFS server 10.20.10.10 not > responding still trying > Jul 27 20:41:07 bullwinkle unix: NFS server 10.20.10.10 ok > Jul 27 20:42:06 bullwinkle unix: NFS server 10.20.10.10 not > responding still trying > Jul 27 20:42:06 bullwinkle unix: NFS server 10.20.10.10 ok > Jul 27 20:43:30 bullwinkle unix: NFS server 10.20.10.10 not > responding still trying > Jul 27 20:43:30 bullwinkle unix: NFS server 10.20.10.10 ok > Jul 27 20:45:40 bullwinkle unix: NFS server 10.20.10.10 not > responding still trying > Jul 27 20:45:40 bullwinkle unix: NFS server 10.20.10.10 ok > > When I switched the mount to v2 (udp), these errors went away, and > performance seemed to increase. Netapp documentation > recommends v3, but it > doesnt seem to work very well in my environment. > > Since this filer is used for a DB, this is causing huge > problems. Should I > just leave it at v2, or is worth trying to determine why v3 > is performing > poorly? > > -Brian >

1 0

Re: Performance probs with UDP NFSv3.
by Eyal Traitel 28 Jul '00

28 Jul '00

I've been working with 2.7 and F760 using v3 and 32K transfer size with no problems. I believe that you should look at the toasters archive for Solaris client optimizations maybe, since there were a few talk this year about that. Anyway, you better look also if you're at the latest patch level of the Solaris... Eyal. --- foo <foo+netapp(a)eek.org> wrote: > Has anyone else experienced performance problems using UDP NFSv3 > between > Solaris 2.7 and an F760 (running 5.3.5R2)? > > Using v3 I get the following errors intermittently: > > Jul 27 20:41:07 bullwinkle unix: NFS server 10.20.10.10 not > responding still trying > Jul 27 20:41:07 bullwinkle unix: NFS server 10.20.10.10 ok > Jul 27 20:42:06 bullwinkle unix: NFS server 10.20.10.10 not > responding still trying > Jul 27 20:42:06 bullwinkle unix: NFS server 10.20.10.10 ok > Jul 27 20:43:30 bullwinkle unix: NFS server 10.20.10.10 not > responding still trying > Jul 27 20:43:30 bullwinkle unix: NFS server 10.20.10.10 ok > Jul 27 20:45:40 bullwinkle unix: NFS server 10.20.10.10 not > responding still trying > Jul 27 20:45:40 bullwinkle unix: NFS server 10.20.10.10 ok > > When I switched the mount to v2 (udp), these errors went away, and > performance seemed to increase. Netapp documentation recommends v3, > but it > doesnt seem to work very well in my environment. > > Since this filer is used for a DB, this is causing huge problems. > Should I > just leave it at v2, or is worth trying to determine why v3 is > performing > poorly? > > -Brian ===== Yours, Eyal Traitel eTraitel(a)yahoo.com, Home: 972-3-5290415 (Tel Aviv) *** eTraitel - it's the new eBuzzword ! *** __________________________________________________ Do You Yahoo!? Kick off your party with Yahoo! Invites. http://invites.yahoo.com/

1 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

toasters July 2000