toasters August 2000

toasters@lists.teaparty.net

152 participants
177 discussions

Scheduling
by Edward Hibbert 18 Aug '00

18 Aug '00

I have a question on scheduling which you folks might have had some experience of. It's more of an appeal for help, really. We have a distributed POP/IMAP server which stores its data on a central NFS server. The app uses many (upto 100) pthreads, each of which might be issuing a disk op. Typically the pthreads are bound to an OS thread/LWP. We have acquired a non-exclusive file lock on the files to disable client-side caching. Under heavy load, what we see is that some individual write/read calls take an age to complete (upto several minutes). During this time period many other read/write calls are happening happily enough and fast from other threads. When the load lets up, the long ops complete. This behaviour becomes increasingly common as we increase the number of threads in the application - with 1 thread it doesn't happen, with 10 it's quite common, with 50 it's unbearable. We see this behaviour: - With both NetApp and Solaris as the NFS server. It takes more stress to kick in with NetApp, probably because it's faster. - With various OSs as the NFS client (Solaris, HP, Linux). It's almost as though there is last in first out thread scheduling going on somewhere, but although it feels like a client-side issue we can't tie it down to any one OS. We have some evidence that the behaviour may be better with NFS v2 rather than v3. When this is ocurring, it's not that the total throughput in disk read/write ops is poor - it's a little less than the peak we see - but that the scheduling is unfair. This is a problem for us because some of the disk ops are associated with a locked global resource, so if that disk op is delayed a lot, we get heavy contention on the resource, response times to individual user requests suffer, and so on. Have any of you seen anything like this? Edward Hibbert DCL.

1 0

Re: [cricket-users] Negative SNMP counters (netapp)
by Matthew Stier 17 Aug '00

17 Aug '00

1) NetApp makes a great product. The problem is that their initial MIB was created in the SNMP v1 days; when the concept of counters and gauges did not exist. Hence the use of integers in places were we would expect to see counters. 2) RRDTool, and by extension Cricket, is based upon SNMP v2; where the concept of Counters and Gauges were introduced. 3) Due to historic reasons, the SNMP community does not permit the modification of objects once they are cast in stone. [Made public] OID's can be depreciated, but cannot be modified or eliminated. Note: This is were Network Appliance has fallen short. The INTEGER objects in the mib should have been declared historical long ago, and new objects created with a more appropriate object types. 4) The NetApp MIB is tightly tied to the OS release. Check the mib on the system being monitored. You may find that the SNMP v1 object you want to graph, may have been superseded by a SNMP v2 object. I switched from 'miscNfsOps' to 'tnfsCalls' when I upgraded the OS on my filers earlier this year. (I'm running 5.3.4R3) Benjamin Cooper wrote: > Wow, you people have been busy over the night. I didn't mean to turn > this into such a huge thread. > > Could someone send me a summary of what has been concluded? I have read > all the messages, and as far as I can see this are what has been > decided: > 1. NetApp is stupid and really screwed up. > 2. My graphs should work anyway. > 3. Since they don't I have to resort to using the exec collector, and > munge the data > myself. > > Is that a valid summary? Can someone tell me exactly (technically) why > it won't work the way I have it? > > -- > Benjamin Cooper > UCSA I/S Group > Hal Computer > (408) 341-5135 > > _______________________________________________ > cricket-users mailing list > cricket-users(a)lists.sourceforge.net > http://lists.sourceforge.net/mailman/listinfo/cricket-users -- Matthew Lee Stier * Fujitsu Network Communications Unix Systems Administrator | Two Blue Hill Plaza Ph: 914-731-2097 Fx: 914-731-2011 | Sixth Floor Matthew.Stier(a)fnc.fujitsu.com * Pearl River, NY 10965

1 0

removal from mail list
by Peek_Jeff＠emc.com 17 Aug '00

17 Aug '00

Please remove me from this mail list. Some of this is good reading, but It fills up my mail box too quickly. I do appreciate what participation I had. Regards, Jeff

1 0

restore going nuts!
by Steve Losen 17 Aug '00

17 Aug '00

Has anyone seen this one? I have been having problems copying a volume to another volume on the same filer. I did a level 0 copy with ndmpcopy and that appeared to work OK. At the end the restore had these errors right before it exited: home1.Virginia.EDU: DUMP: HALT: The operation was successful! Waiting for home1.Virginia.EDU RESTORE to halt too. home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: bad entry: incomplete operations home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: name: ./h1/e/en/enh7f/RSTTMP05989666 home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: parent name ./h1/e/en/enh7f home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: sibling name: ./h1/e/en/enh7f/Grigory Pechorin as Superflous Psychopath.doc home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: entry type: LEAF home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: inode number: 5989666 home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: flags: TMPNAME home1.Virginia.EDU: RESTORE: HALT: The operation was successful! The transfer is complete. Elapsed time: 12 hours, 13 minutes, 24 seconds. I'm not conviced that the restore was indeed 100% successful. This filer is production, so I can't do the whole copy during a downtime. A few days after the level0 I tried to do a level1 ndmpcopy to pick up any changes, and the restore failed with these errors: home1.Virginia.EDU: LOG: DUMP: home1.Virginia.EDU: LOG: dumping (Pass IV) [regular files] home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: Warning: cannot remove file banner.htm: No such file or directory home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: bad entry: not marked REMOVED home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: name: ./lowinum/banner.htm home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: parent name ./lowinum home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: entry type: LEAF home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: inode number: 9024 home1.Virginia.EDU: LOG: RESTORE: home1.Virginia.EDU: LOG: flags: REMOVED|KEEP home1.Virginia.EDU: RESTORE: HALT: The operation was successful! Waiting for home1.Virginia.EDU DUMP to halt too. (If it sits here forever, the transfer was successful, but the source filer has hung. Press ^C.) home1.Virginia.EDU: LOG: DUMP: home1.Virginia.EDU: LOG: Error writing to standard output: Broken pipe home1.Virginia.EDU: LOG: DUMP: home1.Virginia.EDU: LOG: DUMP IS ABORTED home1.Virginia.EDU: Connection halted: HALT: Internal error! Elapsed time: 0 hours, 21 minutes, 14 seconds. Maybe it didn't like the restore_symboltable file or perhaps it didn't like the dump. After a few tries, with different snapshots, I gave up. So I decided I would have to use rsh dump | rsh restore and use the "x" option on restore rather than "r", which ndmpcopy uses. This would simply restore the level 1 dump without trying to remove files deleted on the live filesystem between the level0 and level1. I didn't trust the restore_symboltable file, so I moved it aside, even though "x" isn't supposed to use it. Then I ran this command: rsh filer -l root -n 'dump 1uf - /vol/vol0/.snapshot/level1' | \ rsh filer -l root 'restore xfD - /vol/vol1' This seemed to be working fine and the dump finished according to the output from rsh, but the restore is still running over an hour later and it's gobbling up 100% of the filer CPU. Furthermore, it seems to be ZEROING OUT ALL THE FILES THAT IT RESTORED! While the dump was running, I checked my own home directory on the new volume because I had changed some files after the level0. Sure enough, some of the new files were there and intact, so restore had restored them. Now that the dump is done, I check those same files and they are all 0 length! What's going on? This volume, has 120G used, but the level1 dump was only 3480944 K according to dump (a bit over 3G). The new volume is twice as big as the old one. The "rsh dump" process has exited, but the "rsh restore" process is still there and the filer CPU is at 100%. I just killed the "rsh restore" and got this error: RESTORE: Interrupted Now the filer CPU is down to 10% or so. We are running DOT 5.3.5R2P2. Steve Losen scl(a)virginia.edu phone: 804-924-0640 University of Virginia ITC Unix Support

1 0

None
by Lynn Gurske 17 Aug '00

17 Aug '00

Please add me to this mailing list Thanks Lynn

1 0

Re: NDMP Backups and restore with NetBackup
by Steve Kappel 17 Aug '00

17 Aug '00

David Lockwood wrote: >Consider the following scenario: > >Data is written to a volume on one NetApp filer, on the hour this data is mirrored using snapmirror to a remote filer. NetBackup is used, using NDMP, to backup the data from the remote filer (snapmirror copy). Backups work fine and are reliable until ..... > >.... you want to restore data. The Netbackup restore option for NDMP types does not appear to give any option for restoring data to any alternate location. The original location is of course an offline volume that is read only. If by 'location' you mean a different NetApp, NetBackup calls this "alternate client restore". NetBackup NDMP supports this beginning in version 3.4 which was released recently. If you mean a different directory, NetBackup calls this "alternate path restore" and NetBackup NDMP has supported that at least as far back as 3.2. If you need help on how to do this, email me privately and I'll have someone here give you the details (depends on which user interface you are using). __________________________________________________________________________ Steve Kappel steve.kappel(a)veritas.com VERITAS Software steve.kappel(a)iname.com (Personal)

1 0

RE: RCS issues with ONTAP 5.4.3R2
by Muhlestein, Mark 17 Aug '00

17 Aug '00

This is likely bug #21019, fixed in 5.3.5. Check with NetApp support to get the exact recommended version, since the fix exists in many releases. Mark Muhlestein -- mmm(a)netapp.com <mailto:mmm@netapp.com> -----Original Message----- From: Clawson, Simon [mailto:simon_clawson@mentorg.com] Sent: Thursday, August 17, 2000 9:45 AM To: toasters(a)mathworks.com Subject: RE: RCS issues with ONTAP 5.4.3R2 Sorry, more clarification suggested.... We have a call open with the support team, who are at a bit of a loss too. I was hoping that there would be a user out there who has seen this too... We went up from 5.2.3 to 5.3.4R2. The user working on UNIX boxes are able to be checkout file with RCS exclusive locks. Users on PC's get "permission denied" The user "eegbuild" owns most of the files in the share where the problem is being experinced , however this has never caused an issue. We have fond that once the users check out files on unix (which changes ownership to them) then they can check out when working on the PC. Likewise, user can check out files owned by users other than eegbuild. The problem appears to be on the CIFS side of things. -----Original Message----- From: Clawson, Simon [mailto:simon_clawson] Sent: 17 August 2000 16:38 To: toasters(a)mathworks.com Subject: RCS issues with ONTAP 5.4.3R2 Hi! We have upgraded (downgraded??) to 5.4.3R2. Everything seems to work apart from RCS access. User can open and lock files on UNIX boxes OK. However on PC, users cannot open and lock files that have been created by another user. This operation work fine up until this upgrade (Previously we were on 5.2.3) ANY ONE? cheers Simon Simon Clawson Renoir Group Systems Administrator Mentor Graphics Uk Rivergate London Road Newbury Berkshire RG14 2QB <<Clawson, Simon.vcf>>

1 0

quota -v user
by Stephen C. Woods 17 Aug '00

17 Aug '00

We had an interesting problem yesterday. We're running a 740 with 5.3.4r2. When the problem showed up (yesterday AM) we had been up for some 142 days (I just love these machines). On AIX and HPUX machines quota -v user hung for several minutes and then returned no information. Solaris machines on the same networks returned right away with the correct information. This morning I rebooted the filer and the problem went away. Tech support hadn't a clue. Has anyone seen this before? ----- Stephen C. Woods; UCLA SEASnet; 2567 Boelter hall; LA CA 90095; (310)-825-8614 Finger for public key scw(a)kinross.seas.ucla.edu,Internet mail:scw@SEAS.UCLA.EDU

1 0

NDMP Backups and restore with NetBackup
by David Lockwood 17 Aug '00

17 Aug '00

Consider the following scenario: Data is written to a volume on one NetApp filer, on the hour this data is mirrored using snapmirror to a remote filer. NetBackup is used, using NDMP, to backup the data from the remote filer (snapmirror copy). Backups work fine and are reliable until ..... .... you want to restore data. The Netbackup restore option for NDMP types does not appear to give any option for restoring data to any alternate location. The original location is of course an offline volume that is read only. Has anyone got any bright ideas about how this could be overcome. Thanks David ---------------------------------------------------------------------- For further information on Benetton Formula 1 visit our web site at www.benettonf1.com. This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please destroy and notify Benetton Formula on +44 1608678000. ----------------------------------------------------------------------

1 0

RCS issues with ONTAP 5.4.3R2
by Clawson, Simon 17 Aug '00

17 Aug '00

Hi! We have upgraded (downgraded??) to 5.4.3R2. Everything seems to work apart from RCS access. User can open and lock files on UNIX boxes OK. However on PC, users cannot open and lock files that have been created by another user. This operation work fine up until this upgrade (Previously we were on 5.2.3) ANY ONE? cheers Simon Simon Clawson Renoir Group Systems Administrator Mentor Graphics Uk Rivergate London Road Newbury Berkshire RG14 2QB <<Clawson, Simon.vcf>>

1 0

← Newer
1
...
4
5
6
7
8
9
10
...
18
Older →

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

toasters August 2000