toasters June 2008

toasters@lists.teaparty.net

58 participants
52 discussions

Re: How to identify a hot disk
by Blackmor, Chris 11 Jun '08

11 Jun '08

My understanding of reallocate is that it will put a definite load on your filer and shouldn't be run during prime time. ----- Original Message ----- From: owner-toasters(a)mathworks.com <owner-toasters(a)mathworks.com> To: 'Page, Jeremy' <jeremy.page(a)gilbarco.com>; toasters(a)mathworks.com <toasters(a)mathworks.com> Sent: Thu Jun 05 10:57:12 2008 Subject: RE: How to identify a hot disk It’s good you are at 50% aggr usage, as you’ll need 50% free space in each volume you run the reallocate on. I think running the reallocate is the best first step as it is fairly un-intrusive and you can run it during the day unless you are hammering the filer constantly. When we run it we use the parameter –f to force reallocation without caring how well it is already laid out. Not sure about your question on A-SIS. HTH, - Hadrian From: owner-toasters(a)mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Page, Jeremy Sent: Thursday, June 05, 2008 7:13 AM To: toasters(a)mathworks.com Subject: RE: How to identify a hot disk Thu Jun 5 10:06:02 EDT [gvr-array02: wafl.scan.layout.advise:info]: WAFL layout ratio for volume nfs2 is 4.01. A ratio of 1 is optimal. Based on your free space, 1.42 is expected. Would you say I need to do a reallocate? I’m not sure why this is so fragmented, this file system has never been more then 50% full, could A-SIS have something to do with it? Jeremy M. Page____________________ Systems Architect * email:Jeremy.Page@gilbarco.com - ( phone: 336.547.5399 - 6 fax: 336.547.5163 - ( cell: 336.601.7274 ________________________________ From: Uddhav Regmi [mailto:uddhav.regmi@worldnet.att.net] Sent: Thursday, June 05, 2008 9:38 AM To: Page, Jeremy; toasters(a)mathworks.com Subject: RE: How to identify a hot disk hmmm very interesting.... looks like those are max out do wafl scan measure layout and see where you stand.... if needed do reallocate..... I have seen hundreds of cases where it helped a lot -uddhav From: owner-toasters(a)mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Page, Jeremy Sent: Thursday, June 05, 2008 8:27 AM To: toasters(a)mathworks.com Subject: Re: How to identify a hot disk Not sure why but I have two disks that are maxed out while the rest are far lower utilization. What would cause this, the raid groups where created all at the same time, there are 10 disks per raid group and 3 groups in the aggragate. /aggr0/plex0/rg0: 1c.16 2 0.94 0.18 1.00 42250 0.49 12.18 1396 0.27 11.83 521 0.00 .... . 0.00 .... . 1c.32 6 2.56 0.49 1.00 104545 1.80 4.18 1323 0.27 11.83 634 0.00 .... . 0.00 .... . 1c.48 98 116.05 114.20 1.62 18481 1.53 4.24 1743 0.31 11.00 2649 0.00 .... . 0.00 .... . 1c.17 41 55.21 54.76 2.17 5697 0.27 20.17 1537 0.18 16.25 600 0.00 .... . 0.00 .... . 1c.33 49 72.07 71.62 1.84 4760 0.27 20.67 871 0.18 16.75 1239 0.00 .... . 0.00 .... . 1c.49 45 61.42 60.97 2.18 4047 0.22 23.00 913 0.22 14.40 931 0.00 .... . 0.00 .... . 1c.64 5 117.40 116.86 1.61 307 0.36 15.25 1336 0.18 18.00 319 0.00 .... . 0.00 .... . 1c.65 4 71.76 71.17 1.83 292 0.27 20.67 1298 0.31 10.43 370 0.00 .... . 0.00 .... . 1c.80 4 49.50 49.05 2.36 333 0.22 22.80 1333 0.22 14.60 712 0.00 .... . 0.00 .... . 1c.81 5 95.14 94.69 1.71 292 0.22 22.80 1325 0.22 14.60 548 0.00 .... . 0.00 .... . /aggr0/plex0/rg1: 1c.66 1 0.67 0.00 .... . 0.31 19.29 1311 0.36 10.50 238 0.00 .... . 0.00 .... . 1c.83 1 0.67 0.00 .... . 0.31 19.29 1415 0.36 10.50 190 0.00 .... . 0.00 .... . 1c.82 4 48.60 48.24 2.21 320 0.22 22.80 1553 0.13 21.33 234 0.00 .... . 0.00 .... . 1c.19 51 67.89 67.44 1.92 5315 0.22 22.80 1281 0.22 13.20 788 0.00 .... . 0.00 .... . 1c.18 55 72.75 72.34 1.90 4996 0.22 23.00 1122 0.18 16.00 1109 0.00 .... . 0.00 .... . 1c.35 30 36.10 35.52 2.67 3190 0.31 15.86 1802 0.27 11.33 588 0.00 .... . 0.00 .... . 1c.34 41 52.97 52.43 2.04 4207 0.31 16.29 1570 0.22 13.60 750 0.00 .... . 0.00 .... . 1c.51 100 119.82 119.46 1.57 25873 0.22 22.80 1588 0.13 21.33 2313 0.00 .... . 0.00 .... . 1c.50 59 72.03 71.44 1.83 7750 0.27 19.33 1233 0.31 10.86 1289 0.00 .... . 0.00 .... . 1c.67 4 94.60 94.15 1.68 279 0.18 24.50 1071 0.27 12.33 338 0.00 .... . 0.00 .... . /aggr0/plex0/rg2: 1c.85 1 0.94 0.00 .... . 0.54 12.00 1806 0.40 10.11 538 0.00 .... . 0.00 .... . 1c.84 1 0.94 0.00 .... . 0.54 12.00 1729 0.40 10.11 495 0.00 .... . 0.00 .... . 1c.21 47 67.40 66.90 1.86 4701 0.27 19.17 1452 0.22 14.20 845 0.00 .... . 0.00 .... . 1c.20 56 73.56 73.02 1.79 4866 0.18 25.50 1039 0.36 9.63 870 0.00 .... . 0.00 .... . 1c.37 51 68.79 68.21 1.76 6072 0.27 19.17 1174 0.31 11.43 988 0.00 .... . 0.00 .... . 1c.36 42 50.85 50.18 2.31 3807 0.40 12.78 1852 0.27 13.33 800 0.00 .... . 0.00 .... . 1c.53 59 75.85 75.18 1.86 5024 0.36 14.25 2237 0.31 10.43 493 0.00 .... . 0.00 .... . 1c.52 18 21.27 20.77 3.83 2205 0.27 19.17 1496 0.22 14.20 465 0.00 .... . 0.00 .... . 1c.69 5 71.76 71.00 1.96 296 0.40 12.78 2087 0.36 9.63 610 0.00 .... . 0.00 .... . 1c.68 5 71.58 71.13 1.84 352 0.22 23.60 1314 0.22 14.80 514 0.00 .... . 0.00 .... . This message (including any attachments) contains confidential and/or proprietary information intended only for the addressee. Any unauthorized disclosure, copying, distribution or reliance on the contents of this information is strictly prohibited and may constitute a violation of law. If you are not the intended recipient, please notify the sender immediately by responding to this e-mail, and delete the message from your system. If you have any questions about this e-mail please notify the sender immediately. This message (including any attachments) contains confidential and/or proprietary information intended only for the addressee. Any unauthorized disclosure, copying, distribution or reliance on the contents of this information is strictly prohibited and may constitute a violation of law. If you are not the intended recipient, please notify the sender immediately by responding to this e-mail, and delete the message from your system. If you have any questions about this e-mail please notify the sender immediately.

7 8

Re: strange behaviour, Linux and NFS on NTFS qtree
by Webster, Stetson 11 Jun '08

11 Jun '08

Use 'ifstat -z' to zero the stats and see if the errors are still currently occurring. If they are occurring, and if still CRC's, check hardware starting with the easiest first: cable. This would be why UDP *seems* to work while TCP "blows the whistle" on the errors. ----- Original Message ----- From: James Beal <james.beal(a)sanger.ac.uk> To: Webster, Stetson Cc: dh3(a)sanger.ac.uk <dh3(a)sanger.ac.uk>; toasters(a)mathworks.com <toasters(a)mathworks.com>; Blakemore, Steven Sent: Wed Jun 11 13:40:52 2008 Subject: Re: strange behaviour, Linux and NFS on NTFS qtree Webster, Stetson wrote: > > Any errors or strange numbers in 'ifstat -av' ? > netapp1a*> ifstat -v e4a -- interface e4a (7 days, 10 hours, 23 minutes, 16 seconds) -- RECEIVE Frames/second: 311 | Bytes/second: 2251k | Errors/minute: 4864 Discards/minute: 0 | Total frames: 277m | Total bytes: 63451m Total errors: 36939k | Total discards: 0 | Multi/broadcast: 108k No buffers: 0 | Non-primary u/c: 101k | Tag drop: 0 Vlan tag drop: 0 | Vlan untag drop: 0 | Mac octets: 132g UCast pkts: 1187m | MCast pkts: 17000 | BCast pkts: 91533 CRC errors: 36939k | Bus overrun: 0 | Alignment errors: 0 Long frames: 0 | Jabber: 0 | Pause frames: 0 Runt frames: 0 | Symbol errors: 0 | Jumbo frames: 0 TRANSMIT Frames/second: 433 | Bytes/second: 1506k | Errors/minute: 0 Discards/minute: 0 | Total frames: 563m | Total bytes: 3323g Total errors: 0 | Total discards: 0 | Multi/broadcast: 7559 Queue overflows: 0 | No buffers: 0 | Frames queued: 0 Buffer coalesces: 4 | MTUs too big: 0 | Mac octets: 3462g UCast pkts: 2452m | MCast pkts: 2922 | BCast pkts: 4637 Bus underruns: 0 | Pause frames: 0 | Jumbo frames: 0 LINK_INFO Current state: up | Up to downs: 1 | Speed: 10000m Duplex: full | Flowcontrol: full It's using a 10Gig card with no vif's ( I believe ) . When we stress the system we see the following on the console , although when we do this test we don't see the problem ( We see it when I throw about 1000 cores at the system and the 10Gig card is at about 60% and the CPU is around 100% ). I have asked our presales engineer about this. >> >> >> XXX restart_tx >> restart_offloadq >> XXX restart_tx >> XXX restart_tx >> restart_offloadq >> XXX restart_tx >> XXX restart_tx > > ----- Original Message ----- > From: Dave Holland <dh3(a)sanger.ac.uk> > To: Webster, Stetson > Cc: toasters(a)mathworks.com <toasters(a)mathworks.com> > Sent: Wed Jun 11 11:58:38 2008 > Subject: Re: strange behaviour, Linux and NFS on NTFS qtree > > On Wed, Jun 11, 2008 at 11:30:22AM -0400, Webster, Stetson wrote: > > What ONTAP release, what Linux kernel, what NFS mount options? > > I knew I'd missed things... > > It's ONTAP 7.2.4. I can upgrade to 7.2.5 if that'll help. > > The Linux kernel is 2.6.18-6-686 (Debian 4.0, 2.6.18.dfsg.1-18etch4), > and the problem also shows with 2.6.8-2-686-smp and > 2.6.5--286tg3susesfs. > > I noticed this when mounting with proto=tcp,vers=3,rsize=8192,wsize=8192. > I'd also tried UDP, and the problem persisted. > > But after your email I tried vers=2 and the problem goes away (with both > TCP and UDP) which is interesting indeed. Although with the crazy size > files and filesystems around here, NFSv3 is very desirable. > -- james beal -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

1 0

Re: strange behaviour, Linux and NFS on NTFS qtree
by Webster, Stetson 11 Jun '08

11 Jun '08

Any errors or strange numbers in 'ifstat -av' ? ----- Original Message ----- From: Dave Holland <dh3(a)sanger.ac.uk> To: Webster, Stetson Cc: toasters(a)mathworks.com <toasters(a)mathworks.com> Sent: Wed Jun 11 11:58:38 2008 Subject: Re: strange behaviour, Linux and NFS on NTFS qtree On Wed, Jun 11, 2008 at 11:30:22AM -0400, Webster, Stetson wrote: > What ONTAP release, what Linux kernel, what NFS mount options? I knew I'd missed things... It's ONTAP 7.2.4. I can upgrade to 7.2.5 if that'll help. The Linux kernel is 2.6.18-6-686 (Debian 4.0, 2.6.18.dfsg.1-18etch4), and the problem also shows with 2.6.8-2-686-smp and 2.6.5--286tg3susesfs. I noticed this when mounting with proto=tcp,vers=3,rsize=8192,wsize=8192. I'd also tried UDP, and the problem persisted. But after your email I tried vers=2 and the problem goes away (with both TCP and UDP) which is interesting indeed. Although with the crazy size files and filesystems around here, NFSv3 is very desirable. thanks, Dave -- ** Dave Holland ** Systems Support -- Infrastructure Management ** ** 01223 496923 ** The Sanger Institute, Hinxton, Cambridge, UK ** -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

1 0

Re: strange behaviour, Linux and NFS on NTFS qtree
by Webster, Stetson 11 Jun '08

11 Jun '08

What ONTAP release, what Linux kernel, what NFS mount options? ----- Original Message ----- From: Dave Holland <dh3(a)sanger.ac.uk> To: Toasters <toasters(a)mathworks.com> Sent: Wed Jun 11 09:56:10 2008 Subject: strange behaviour, Linux and NFS on NTFS qtree Hello, I'm seeing some strange behaviour with a FAS3040 filer I have on evaluation at the moment. I have an NTFS-style qtree exported by NFS and CIFS. Debian Linux clients see odd behaviour relating to open() and stat64() system calls. This strace output from "vim" captures it in a nutshell: uname({sys="Linux", node="acheron", ...}) = 0 stat64("ffff", 0xbfb4d030) = -1 ENOENT (No such file or directory) stat64("ffff", 0xbfb4d0b0) = -1 ENOENT (No such file or directory) access("ffff", W_OK) = -1 ENOENT (No such file or directory) open("ffff", O_RDONLY) = -1 ENOENT (No such file or directory) readlink("ffff", 0xbfb4c7cc, 1023) = -1 ENOENT (No such file or directory) open(".ffff.swp", O_RDONLY) = -1 ENOENT (No such file or directory) open(".ffff.swp", O_RDWR|O_CREAT|O_EXCL, 0600) = -1 EACCES (Permission denied) stat64(".ffff.swp", {st_mode=S_IFREG|0777, st_size=0, ...}) = 0 Note that the open(O_RDWR...) call fails with EACCES; but the following stat64() call succeeds. The file ffff.swp was created on disk despite the reported failure of the open() call. This behaviour is seen using "vim" to edit files, and causes an error message about the swap file being present (due to the swap file being created even though the open() return value implies it was not). Trying the same "vim" command on a Tru64 NFS client, correct behaviour is seen: the open(O_RDWR...) succeeds and a filehandle is returned. I asked Netapp support and they suggested a mixed-style qtree. But I've read on this list about the last-change-wins problem on permissions and ACLs, and I'd really rather not go there. I've worked through the troubleshooting guide: http://now.netapp.com/NOW/knowledge/docs/olio/guides/ontap_troubleshooting with no luck. Is there anything else I can configure (on the filer or on the clients) to avoid this problem? Thanks for any suggestions! Dave -- ** Dave Holland ** Systems Support -- Infrastructure Management ** ** 01223 496923 ** The Sanger Institute, Hinxton, Cambridge, UK ** -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

4 4

CIFS quota question
by Zeeshan 11 Jun '08

11 Jun '08

Hello all... I have a 100GB XXX CIFS share in a FlexVol qtree. The qtree size is 100GB. The qtree has only USER quota restriction of 1GB per user. When the number of AD users (who have been allocated XXX share as storage) exceeds 100, a. Will ONTAP give me some sort of a warning like: The volume or qtree has been fully allocated or almost full???Although the space used by users data is much less than 100GB, but allocation for 100 users has been made? b. Will the AD be able to create more than 100 users and allocate XXX share as storage for each user???or will ONTAP disallow AD to allocate XXX share to more than 100 users??? Thanx

1 0

strange behaviour, Linux and NFS on NTFS qtree
by Dave Holland 11 Jun '08

11 Jun '08

Hello, I'm seeing some strange behaviour with a FAS3040 filer I have on evaluation at the moment. I have an NTFS-style qtree exported by NFS and CIFS. Debian Linux clients see odd behaviour relating to open() and stat64() system calls. This strace output from "vim" captures it in a nutshell: uname({sys="Linux", node="acheron", ...}) = 0 stat64("ffff", 0xbfb4d030) = -1 ENOENT (No such file or directory) stat64("ffff", 0xbfb4d0b0) = -1 ENOENT (No such file or directory) access("ffff", W_OK) = -1 ENOENT (No such file or directory) open("ffff", O_RDONLY) = -1 ENOENT (No such file or directory) readlink("ffff", 0xbfb4c7cc, 1023) = -1 ENOENT (No such file or directory) open(".ffff.swp", O_RDONLY) = -1 ENOENT (No such file or directory) open(".ffff.swp", O_RDWR|O_CREAT|O_EXCL, 0600) = -1 EACCES (Permission denied) stat64(".ffff.swp", {st_mode=S_IFREG|0777, st_size=0, ...}) = 0 Note that the open(O_RDWR...) call fails with EACCES; but the following stat64() call succeeds. The file ffff.swp was created on disk despite the reported failure of the open() call. This behaviour is seen using "vim" to edit files, and causes an error message about the swap file being present (due to the swap file being created even though the open() return value implies it was not). Trying the same "vim" command on a Tru64 NFS client, correct behaviour is seen: the open(O_RDWR...) succeeds and a filehandle is returned. I asked Netapp support and they suggested a mixed-style qtree. But I've read on this list about the last-change-wins problem on permissions and ACLs, and I'd really rather not go there. I've worked through the troubleshooting guide: http://now.netapp.com/NOW/knowledge/docs/olio/guides/ontap_troubleshooting with no luck. Is there anything else I can configure (on the filer or on the clients) to avoid this problem? Thanks for any suggestions! Dave -- ** Dave Holland ** Systems Support -- Infrastructure Management ** ** 01223 496923 ** The Sanger Institute, Hinxton, Cambridge, UK ** -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

2 2

RE: At which point does high disk utilization impact IO requests?
by Christopher Mende 10 Jun '08

10 Jun '08

Normally if a client is starting to see latency in an app, I use perfstat for a snapshot measurement and consider changes if disk utilization starts creeping up towards 80%. DFM/OM or SNMP-based monitoring tools have been useful in alerting on sustained high disk utilization or volume latency. Christopher Mende Sr Solutions Architect Peak UpTime +1 913 523 4263 ________________________________ From: Hadrian Baron <Hadrian.Baron(a)vegas.com> Sent: Tuesday, June 10, 2008 13:39 To: toasters(a)mathworks.com <toasters(a)mathworks.com> Subject: At which point does high disk utilization impact IO requests? When average disk utilization across all disks in an aggregate becomes high, IO requests are delayed. My question is what should this threshold be? I’ve heard from one performance consultant that 40% disk utilization is where you start seeing latency (this was in regards to a HDS array). The Netapp Storage Performance Management using DFM TR-3525 says 70% is the industry accepted disk utilization threshold. http://www-download.netapp.com/edm/TOT/docs/3525.pdf (page 17) Here is some interesting EMC FUD comparing a 3050 and a CX3-40 which says the Netapp beats the CX3 until disk utilization passes 20% then latency increases dramatically (page 7). http://www.dell.com/downloads/global/products/pvaul/en/netapp_performance_s… (page 7) Every environment is different, so the proper answer is of course – It depends – and you should test for yourself. It depends on whether you can accept 5ms more latency, or 50ms more. It depends on what kind of IO you do. But my question is what *your* threshold is, and how can you automate the monitoring of disk utilization? TIA, Hadrian

1 0

Map CIFS homedir based on share name instead of username?
by Adam McDougall 10 Jun '08

10 Jun '08

Without manually creating a share for each user, is it possible for the filer to use the supplied share name as the connecting user, instead of the username from the credentials? I am trying to mimic the samba behavior where I can connect to \\samba\someuser with ANY credentials, and if those credentials do not work, it prompts me for user/pass. I added our home directory paths in cifs_homedir.cfg and tried different settings for cifs.home_dir_namestyle but they all seem to use the username from the credentials that are connecting, not the requested share name. For example, if I am logged into windows as mcdouga9, I can connect to \\filer\mcdouga9 just fine, but if I connect to \\filer\anothervaliduser, it claims "\\filer\anothervaliduser No network provider accepted the given network path.". I'd rather be prompted for credentials so I can enter the username/password. It does work if I supply the username/pass while connecting. Thanks.

2 1

At which point does high disk utilization impact IO requests?
by Hadrian Baron 10 Jun '08

10 Jun '08

When average disk utilization across all disks in an aggregate becomes high, IO requests are delayed. My question is what should this threshold be? I've heard from one performance consultant that 40% disk utilization is where you start seeing latency (this was in regards to a HDS array). The Netapp Storage Performance Management using DFM TR-3525 says 70% is the industry accepted disk utilization threshold. http://www-download.netapp.com/edm/TOT/docs/3525.pdf (page 17) Here is some interesting EMC FUD comparing a 3050 and a CX3-40 which says the Netapp beats the CX3 until disk utilization passes 20% then latency increases dramatically (page 7). http://www.dell.com/downloads/global/products/pvaul/en/netapp_performance_s… (page 7) Every environment is different, so the proper answer is of course - It depends - and you should test for yourself. It depends on whether you can accept 5ms more latency, or 50ms more. It depends on what kind of IO you do. But my question is what *your* threshold is, and how can you automate the monitoring of disk utilization? TIA, Hadrian

1 0

Rename aggregate :: SnapLock and V-Series
by Kevin Parker 10 Jun '08

10 Jun '08

Hi Folks, Have any of you renamed a SnapLock (enterprise) aggregate? I don't see that renaming the aggr would have any effect or even need to change any data within. These are true aggregates (i.e.: not tradvols) with FlexVols. DOT 7.2.3 and above. Is there anything else to watch out for? What was your experience? Smooth as aggr rename on an otherwise standard aggr? Next question is about renaming a v-series aggregate. Experience? Cheers, Best regards, Kevin Parker

2 1

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

toasters June 2008