toasters December 2000

toasters@lists.teaparty.net

112 participants
116 discussions

Solaris persistently trying to write to a snapshot
by Chris Thompson 26 Dec '00

26 Dec '00

I wonder whether anyone else has seen the following effect? I made a casual reference to the problem back in May, but it's just jumped up and bit us rather hard. [Actually, it was before Christmas, but I didn't get round to writing it up then!] NFS server: F740 running ONTAP 5.3.7R1 NFS client: UltraSPARC (E220R) running Solaris 8 (well patched) [but the problem has been around since 5.3.5 & Solaris 2.6, at least: probably much longer] In some circumstances, the Solaris kernel can acquire the notion that it has pending NFS writes (apparently in the "buffer cache") to be performed on a file in a snapshot. When such a write is rejected by the filer with EROFS, Solaris logs the error... and tries it again after 30 seconds. And again. And again ... On one recent occurrence of this, as the result of a genuine user "error", the messages came so thick and fast that we had to reboot the client machine. The following, by contrast, was a controlled experiment! Test file: $ ls -li test/test 3715488 -rw-r--r-- 1 cet1 cet1 21167 Dec 23 15:50 test/test A snapshot was taken at 16:00. Then the command echo x >>test/.snapshot/hourly.0/test which one might reasonably expect to fail, completed quietly. But then the kernel messages started: Dec 23 18:00:43 draco.cus.cam.ac.uk nfs: [ID 808668 kern.notice] NFS write error on host puppis-intracus: Read-only file system. Dec 23 18:00:43 draco.cus.cam.ac.uk nfs: [ID 702911 kern.notice] (file handle: e1b43900 13b1f03 20000b00 38b1a0 a8268f00 fb380000 40000000 c7741500) Dec 23 18:00:43 draco.cus.cam.ac.uk nfs: [ID 808668 kern.notice] NFS write error on host puppis-intracus: Read-only file system. Dec 23 18:00:43 draco.cus.cam.ac.uk nfs: [ID 702911 kern.notice] (file handle: e1b43900 13b1f03 20000b00 38b1a0 a8268f00 fb380000 40000000 c7741500) Dec 23 18:01:07 draco.cus.cam.ac.uk nfs: [ID 808668 kern.notice] NFS write error on host puppis-intracus: Read-only file system. Dec 23 18:01:07 draco.cus.cam.ac.uk nfs: [ID 702911 kern.notice] (file handle: e1b43900 13b1f03 20000b00 38b1a0 a8268f00 fb380000 40000000 c7741500) Dec 23 18:01:37 draco.cus.cam.ac.uk nfs: [ID 808668 kern.notice] NFS write error on host puppis-intracus: Read-only file system. Dec 23 18:01:37 draco.cus.cam.ac.uk nfs: [ID 702911 kern.notice] (file handle: e1b43900 13b1f03 20000b00 38b1a0 a8268f00 fb380000 40000000 c7741500) ... continuing apparently indefinitely. I was quite expecting to have to reboot this client machine as well, but coming back after Christmas I find they eventually terminated: ... Dec 25 15:59:37 draco.cus.cam.ac.uk nfs: [ID 808668 kern.notice] NFS write error on host puppis-intracus: Read-only file system. Dec 25 15:59:37 draco.cus.cam.ac.uk nfs: [ID 702911 kern.notice] (file handle: e1b43900 13b1f03 20000b00 38b1a0 a8268f00 fb380000 40000000 c7741500) Dec 25 16:00:07 draco.cus.cam.ac.uk nfs: [ID 626546 kern.notice] NFS write error on host puppis-intracus: Stale NFS file handle. Dec 25 16:00:07 draco.cus.cam.ac.uk nfs: [ID 702911 kern.notice] (file handle: e1b43900 13b1f03 20000b00 38b1a0 a8268f00 fb380000 40000000 c7741500) The snapshot had been deleted, and the change from an EROFS to an ESTALE seems to have finally persuaded the Solaris kernel to stop retrying! [The file handle contents agree perfectly with the expected value for the snapshot of test/test, by the way, based on the description that Guy Harris gave last May.] I suppose this has to be seen primarily as a Solaris problem, but I wonder what attributes of the NetApp filer are confusing it. Q1. Why doesn't the open for writing of "test/.snapshot/hourly.0/test" fail? Is Solaris being confused by parts of a single NFS filing system being read-only, but not all of it? Or by the duplicate inode numbers of "test/test" (which would certainly have been cached) and "test/.snapshot/hourly.0/test"? Q2. Why does Solaris go on trying the write for so long? [And why keep on logging it so persistently? :-( ] I'll have to open a call with Sun about this in the new year, and would like to understand the problem better myself by then. Chris Thompson University of Cambridge Computing Service, Email: cet1(a)ucs.cam.ac.uk New Museums Site, Cambridge CB2 3QG, Phone: +44 1223 334715 United Kingdom.

1 0

RE: Bouncing autosupport?
by Brar, Suman 24 Dec '00

24 Dec '00

Our production systems is functioning properly and all autosupports are being processed. These bounces were caused by one of our development systems for stress testing the capabilities of email system for autosupport. The developer forgot to throttle the outgoing bounces. Sorry for the inconvenience. I am responsible for the Autosupport program and welcome your feedback. Regards Suman Brar Program Manager Autosupport 408-822-3492 > -----Original Message----- > From: Tuc [mailto:ttsg@ttsg.com] > Sent: Sunday, December 24, 2000 6:18 AM > To: chris(a)webtop.com > Cc: toasters(a)mathworks.com > Subject: Re: Bouncing autosupport? > > > > > > In article <200012240531.AAA01495(a)heimdall.ttsg.com>, Tuc wrote: > > > Anyone just rudely woken by : > > > > Oh yes [Grumbles at people who break their mail systems] > > > I normally don't get upset when people break their mail systems. > However, this time, I'm upset for 2 reasons : > > 1) What if my machine needed to log an autosupport > problem that it > threw a disk, or something else. > 2) I don't like getting woken up at : > > 00:18:04 > 00:26:33 > 02:09:01 > 02:36:28 > 04:17:12 > 04:17:30 > 04:18:01 > 04:18:02 > 05:35:06 > and then finally the one at > 05:35:07 > > which only set off 1 monitoring device, not the > other. So > I had to get up and work for a 1/2 hour to get it running. > Its bad enough > when its my own stuff that has a problem keeping me awake... > BUT A VENDOR! > > Tuc/TTSG >

2 1

Bouncing autosupport?
by Tuc 24 Dec '00

24 Dec '00

Hi, Anyone just rudely woken by : >The original message was received at Sat, 23 Dec 2000 21:00:05 -0800 (PST) >from mx02.dmz.netapp.com [10.254.252.22] > > ----- The following addresses had permanent fatal errors ----- >asupprodtest > (reason: service unavailable) > (expanded from: autosupport-private@herra) >asuparch > (reason: service unavailable) > (expanded from: autosupport-private@herra) > > ----- Transcript of session follows ----- >mail.local: unknown name: >554 5.0.0 asupprodtest,asuparch... Service unavailable When the autosupport emails bounced...? Tuc/TTSG

3 3

snapmirror quarry...
by Premanshu Jain 23 Dec '00

23 Dec '00

I have a volume of 400 gb total will 230 gb data on it. Can I snapmirror this on a smaller 300 gb volume..? thanks, prem

5 4

SCSI over IP available now
by Alvarado, Michael 22 Dec '00

22 Dec '00

It is called NDMP. www.ndmp.org

1 0

Re: Using Network Appliance F7xx's in a WAN environment.
by Paul Wiltsey 22 Dec '00

22 Dec '00

You can use UDP or TCP and you can use Vers 2 or 3. You are going to find most of the performance affecting tuning is done via the NFS client options. Use smaller Rsize and Wsize (ie wsize=2048,rsize=2048) and try increasing your client UDP Buffer if you choose to use NFS over UDP. Also increase your NFS retries to between 3 and 5 and timeout to 10. Remember, over a possibly already fully WAN link... TCP does have slightly more overhead in total packet count transmitted ( provided UDP does not have too many retrans). Have fun with the tuning... and keep track of performance ( copy time ) with all Rsize and Wsize between 512 and 32k . Try small files and larger files. Test on both a read from server and a write to server. ----- Original Message ----- From: Matthew Stier <Matthew.Stier(a)fnc.fujitsu.com> Date: Thursday, December 21, 2000 3:24 pm Subject: Using Network Appliance F7xx's in a WAN environment. > Any known filer tweaks to improve perfomance of NFS across a Wide Area > Network? > > -- > Matthew Lee Stier * Fujitsu Network Communications > Unix Systems Administrator | Two Blue Hill Plaza > Ph: 914-731-2097 Fx: 914-731-2011 | Sixth Floor > Matthew.Stier(a)fnc.fujitsu.com * Pearl River, NY 10965 > > > >

1 0

RE: ndmpcoy maxes the CPU on 840
by Traitel, Eyal 22 Dec '00

22 Dec '00

Which versions are the filers running ? Eyal. ---------------------------------------------------------------------- eTraitel - I'm the new eBuzzword around !!! ---------------------------------------------------------------------- Eyal Traitel - Filer Escalation Engineer CNA, MCSE, CSA, NetApp CA Network Appliance BV Holland Office Center Kruisweg 799b 2132 NG, Hoofddorp The Netherlands Office: +31 23 567 9685 Cellular: +31 6 5497 2568 Email: eyal(a)netapp.com ---------------------------------------------------------------------- Get answers NOW! - NetApp On the Web - http://now.netapp.com ---------------------------------------------------------------------- -----Original Message----- From: Premanshu Jain [mailto:PrJain@shastanets.com] Sent: Friday, December 22, 2000 3:38 AM To: toasters Subject: ndmpcoy maxes the CPU on 840 Whenever I fire an ndmpcopy from a 740 to 840, it immedietly maxes cpu (100%) on the 840. Only a reboot brings CPU activity down then.. Any clues....???? Here is the final error message thrown... bot: LOG: RESTORE: bot: LOG: Thu Dec 21 17:33:12 2000: Restoring NT ACLs. filebot: LOG: DUMP: filebot: LOG: dumping (Pass V) [ACLs] filebot: LOG: DUMP: filebot: LOG: 87029 KB filebot: LOG: DUMP: filebot: LOG: DUMP IS DONE filebot: LOG: DUMP: filebot: LOG: Deleting "/vol/sw/../snapshot_for_backup.150" snapshot. bot: HALT: The operation was successful! Waiting for filebot to halt too. The transfer was successful, but the source filer hasn't halted yet. filebot: HALT: The operation was successful! Broken Pipe mtshasta% Data server never halted, closing connection. Broken Pipe

1 0

ndmpcoy maxes the CPU on 840
by Premanshu Jain 22 Dec '00

22 Dec '00

Whenever I fire an ndmpcopy from a 740 to 840, it immedietly maxes cpu (100%) on the 840. Only a reboot brings CPU activity down then.. Any clues....???? Here is the final error message thrown... bot: LOG: RESTORE: bot: LOG: Thu Dec 21 17:33:12 2000: Restoring NT ACLs. filebot: LOG: DUMP: filebot: LOG: dumping (Pass V) [ACLs] filebot: LOG: DUMP: filebot: LOG: 87029 KB filebot: LOG: DUMP: filebot: LOG: DUMP IS DONE filebot: LOG: DUMP: filebot: LOG: Deleting "/vol/sw/../snapshot_for_backup.150" snapshot. bot: HALT: The operation was successful! Waiting for filebot to halt too. The transfer was successful, but the source filer hasn't halted yet. filebot: HALT: The operation was successful! Broken Pipe mtshasta% Data server never halted, closing connection. Broken Pipe

1 0

Using Network Appliance F7xx's in a WAN environment.
by Matthew Stier 21 Dec '00

21 Dec '00

Any known filer tweaks to improve perfomance of NFS across a Wide Area Network? -- Matthew Lee Stier * Fujitsu Network Communications Unix Systems Administrator | Two Blue Hill Plaza Ph: 914-731-2097 Fx: 914-731-2011 | Sixth Floor Matthew.Stier(a)fnc.fujitsu.com * Pearl River, NY 10965

2 1

Re: Toaster newbie questions
by Paul Wiltsey 21 Dec '00

21 Dec '00

I agree with the preference of Jamey ----- Original Message ----- From: Jamey Maze <jmaze(a)netapp.com> Date: Thursday, December 21, 2000 1:00 pm Subject: Re: Toaster newbie questions > Qtrees are the preferred way to organize things for any application > where > you can't justify a separate volume. With one shelf, I'd recommend > you have > a single 6-drive volume and one hot spare. You can delete qtrees > like any > other folder/directory. > > > At 11:44 AM 12/21/00 -0500, lgkloft(a)usgs.gov wrote: > >I am setting up my first toaster - an F720 with one shelf (seven 36GB > >drives). I'm seeking some "real-life" experiences and > recommendations based > >on what I am trying to do. > > > >I work in a small office with approximately 50 employees and > utilize UNIX > >and NT operating systems. I currently have several $HOME, > $PROJECT, and > >GIS-related filesystems residing on my UNIX system. I plan to move > these>filesystems to the filer, but will be unable to create a > separate volume > >for each filesystem. I considered creating two raid-groups, one > for $HOME > >and one for $PROJECT, but I think there will be a performance hit > (only 1-2 > >data disks per volume) and it will cost me two drives for parity, > not to > >mention the spare disk drive. I am now looking at employing one large > >volume (the root volume) and establishing Qtrees to establish and > manage>HOME, PROJECT, and GIS areas within this volume, which would > require NFS > >and CIFS access. I am also considering the idea of creating an > additional>Qtree to maintain user's roaming profiles for the NT > environment - which > >would require CIFs access only. I don't like the idea of doing > this all in > >one volume (especially /vol/vol0), but it looks as if this is the > route I > >will be taking unless someone has a better recommendation. > Finally, how > >difficult, or is it even possible, to remove a Qtree? I didn't find > >anything in the SA Guide to do this. > > > >Thanks, > >Loren > > > -- > James N. (Jamey) Maze Phone: 615-496-4799 > Network Appliance SE www.netapp.com > >

1 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

toasters December 2000