Looks like you are doing pretty good...
Keep in mind, it's an average, so for specific data sets in the volume it may be much higher (or lower). Workload measurements (statit during specific operations) can tell you much more sometimes.
Filling the volume is typically a bad idea (tm) as WAFL will spend more time finding free space to write. If this is a flexvol and the aggr has tons of space available, this is not a concern.
7.X has great improvements on fixing fragmentation and preventative measures to keep it from becoming an issue.
Remember: Fragmentation is NORMAL with any filesystem and isn't necessarily a problem. What fragmentation _does_ do is introduce latencies that can be detrimental to responsiveness... sometimes this can be felt (ie, it's a problem), sometimes it cannot be felt...
Glenn
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Lori Barfield Sent: Tuesday, March 07, 2006 10:57 PM To: toasters@mathworks.com Subject: Re: wafl scan reallocate
since toasters said it was harmless, just to find out what kind of fragmentation our normal activity creates, i ran wafl scan measure_layout on our busiest filer.
the volume is only about 3-4 months young, but it does get pretty full on a regular basis. here's the output that appeared in my messages file:
Tue Mar 7 15:16:36 PST [wafl.scan.start:info]: Starting WAFL layout measurement on volume vol1. Tue Mar 7 16:04:06 PST [wafl.scan.layout.advise:info]: WAFL layout ratio for volume vol1 is 1.26. A ratio of 1 is optimal. Based on your free space, 6.53 is expected.
i haven't read up on this at all yet...would someone like to interpret this for us wafl scan newbies?
...lori
On 3/7/06, Glenn Walker ggwalker@mindspring.com wrote:
Looks like you are doing pretty good...
Keep in mind, it's an average, so for specific data sets in the volume it may be much higher (or lower). Workload measurements (statit during specific operations) can tell you much more sometimes.
Filling the volume is typically a bad idea (tm) as WAFL will spend more time finding free space to write. If this is a flexvol and the aggr has tons of space available, this is not a concern.
7.X has great improvements on fixing fragmentation and preventative measures to keep it from becoming an issue.
Remember: Fragmentation is NORMAL with any filesystem and isn't necessarily a problem. What fragmentation _does_ do is introduce latencies that can be detrimental to responsiveness... sometimes this can be felt (ie, it's a problem), sometimes it cannot be felt...
thanks, glenn. can you tell us what this means?
Based on your free space, 6.53 is expected.
...lori
Hi Lori,
This may answer your question:
http://forums.netapp.com/searchresults.asp?pos=5&page=2&searchterm=w...
Lori, from your log snippet, I have a few questions which you might have some time free to reply on:
1. Did the scan take ~1hr to run? 2. What size was the volume in question? 3. Did you see any performance impact on the filer? 4. Are you snapmirroring the volume in question? If so did you turn off snapmirroring for the duration of the scan?
Thanks alot, Philip
On Wednesday 08 March 2006 05:01, Lori Barfield wrote:
On 3/7/06, Glenn Walker ggwalker@mindspring.com wrote:
Looks like you are doing pretty good...
Keep in mind, it's an average, so for specific data sets in the volume it may be much higher (or lower). Workload measurements (statit during specific operations) can tell you much more sometimes.
Filling the volume is typically a bad idea (tm) as WAFL will spend more time finding free space to write. If this is a flexvol and the aggr has tons of space available, this is not a concern.
7.X has great improvements on fixing fragmentation and preventative measures to keep it from becoming an issue.
Remember: Fragmentation is NORMAL with any filesystem and isn't necessarily a problem. What fragmentation _does_ do is introduce latencies that can be detrimental to responsiveness... sometimes this can be felt (ie, it's a problem), sometimes it cannot be felt...
thanks, glenn. can you tell us what this means?
Based on your free space, 6.53 is expected.
...lori
On 3/8/06, Philip Boyle philip.boyle@eircom.net wrote:
This may answer your question:
http://forums.netapp.com/searchresults.asp?pos=5&page=2&searchterm=w...
yum, thanks. :)
Lori, from your log snippet, I have a few questions which you might have some time free to reply on:
- Did the scan take ~1hr to run?
i wasn't watching sysstat, but the messages file wants us to believe it took 48 minutes during prime time.
- What size was the volume in question?
/vol/vol1/ 813694976KB 766782408KB 46912568KB 94% /vol/vol1/ /vol/vol1/.snapshot 25165824KB 23847548KB 1318276KB 95% /vol/vol1/.snapshot
(a flexvol.)
- Did you see any performance impact on the filer?
certainly not with the human eye. but we are overpowered and rarely run into user-detectable performance issues on the filer itself (even when i'm multithreading massive level 0 dumps from our sun server farm).
i didn't have a benchmark so i didn't bother with any kind of detailed measurements.
- Are you snapmirroring the volume in question? If so did you turn off
snapmirroring for the duration of the scan?
no snapmirroring.
...lori
another data point for the group. the previous scan was run on our busiest filer (an fas270); below is the impact of doing a scan on our largest user data filer.
this filer is an fas270 with one ds14. there are 11 disks in the raid group (and aggregate). the volume is only 3-4 months old, but storage utilization vascillates wildly on this puppy but it usually operates above 95% and has run out of space several times in its young life. we are not snapmirroring. i think we are currently using 995gb (.97tb) on the volume including snap, but someone should check my calculation (df below).
i started the scan at the beginning of lunchtime, when the filer happened to be nearly quiescent. it took 54 minutes. it killed my cache but only sucked up 7-12% of the cpu.
i imagine a low fragmentation condition would make a scan less taxing.
...lori
/vol/vol1/ 1132462080KB 917995432KB 214466648KB 81% /vol/vol1/ /vol/vol1/.snapshot 125829120KB 153743936KB 0KB 122% /vol/vol1/.snap shot
cartman*> sysstat 1 CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache in out read write read write age 1% 0 0 0 0 0 0 0 0 0 >60 1% 0 0 0 0 0 0 0 0 0 >60 1% 0 0 0 0 0 0 0 0 0 >60 1% 0 0 0 0 0 0 0 0 0 >60 1% 0 0 0 0 0 0 0 0 0 >60 4% 0 0 0 0 0 607 563 0 0 >60
cartman*> wafl scan measure_layout vol1 cartman*> Wed Mar 8 12:15:05 PST [wafl.scan.start:info]: Starting WAFL layout measurement on volume vol1.
cartman*> sysstat 1 CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache in out read write read write age 8% 0 0 0 0 0 583 0 0 0 15 10% 0 0 0 0 0 1168 572 0 0 15 7% 0 0 0 0 0 636 0 0 0 15 8% 0 0 0 0 0 648 0 0 0 15 8% 0 0 0 0 0 632 0 0 0 15 8% 0 0 0 0 0 660 0 0 0 15 7% 0 0 0 0 0 636 0 0 0 15 9% 0 0 0 0 0 696 0 0 0 15 8% 0 0 0 0 0 696 0 0 0 15 12% 0 0 0 0 0 844 0 0 0 15 7% 0 0 0 0 0 668 0 0 0 16
CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache in out read write read write age 7% 0 0 0 0 0 592 0 0 0 4 8% 0 0 0 1 0 736 0 0 0 4 11% 0 0 0 0 0 756 0 0 0 4 10% 0 0 0 0 0 864 0 0 0 4 11% 0 0 0 0 0 920 0 0 0 4 8% 0 0 0 0 0 644 0 0 0 4
cartman*> Wed Mar 8 13:09:00 PST [wafl.scan.layout.advise:info]: WAFL layout ratio for volume vol1 is 1.22. A ratio of 1 is optimal. Based on your free space, 3.94 is expected.
Hi Lori,
Thanks alot for the feedback.
According to the logs, the wafl scan took ~1hr complete on an ~200GB system. Received a ratio of 1.02 :-)
My system is showing 10/15 seconds 100% spike in cpu usage when I'm mirroring data. Netapp figured it may be fragmentation on the disk. Their theory re fragmentation relates to the ratio of Disk kB/s read verus Net kB/s. From the sysstat output and perfstat data they indicate that I have a ratio of 3:1, where 1:1 should be the norm.
Has anyone else come accross this issue? I'm seeing it on systems running ONTAP 6.5.3P4 & 7.0.1R1.
From my research on the NOW site it appears to indicate that this is a known bug: http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=140170
Netapp have asked me to review layer 2 flow control throughout my network and also to review the block size mount options.
Anyone out there running nfs with FreeBSD and seen any perfomance benefits of changing the block size? Our current block size is 8k.
Thanks, Philip
On Wednesday 08 March 2006 20:08, Lori Barfield wrote:
On 3/8/06, Philip Boyle philip.boyle@eircom.net wrote:
This may answer your question:
http://forums.netapp.com/searchresults.asp?pos=5&page=2&searchterm=w... n&searchact=ans&searchxid=0&searchcid=0
yum, thanks. :)
Lori, from your log snippet, I have a few questions which you might have some time free to reply on:
- Did the scan take ~1hr to run?
i wasn't watching sysstat, but the messages file wants us to believe it took 48 minutes during prime time.
- What size was the volume in question?
/vol/vol1/ 813694976KB 766782408KB 46912568KB 94% /vol/vol1/ /vol/vol1/.snapshot 25165824KB 23847548KB 1318276KB 95% /vol/vol1/.snapshot
(a flexvol.)
- Did you see any performance impact on the filer?
certainly not with the human eye. but we are overpowered and rarely run into user-detectable performance issues on the filer itself (even when i'm multithreading massive level 0 dumps from our sun server farm).
i didn't have a benchmark so i didn't bother with any kind of detailed measurements.
- Are you snapmirroring the volume in question? If so did you turn off
snapmirroring for the duration of the scan?
no snapmirroring.
...lori