So I am getting the following when running sysstat -u. As far as I know nothing should be hitting the array hard enough to drive disk utilization to 100%, much less hold it there. What can I do to further determine what's causing this?
All of my storage is NFS based at this time.
29% 799 44261 1344 14900 54402 0 0 38 93% 46% F 77%
29% 896 43727 1938 14643 54488 0 0 38 93% 43% F 94%
28% 734 43831 1102 13251 54881 0 0 38 93% 44% F 77%
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
31% 876 53275 1354 12054 61719 0 0 39 93% 53% F 80%
13% 264 8377 369 15069 14423 0 0 39 92% 31% 2 94%
8% 107 674 33 13300 5331 0 0 39 92% 30% T 94%
7% 101 601 112 13263 1026 0 0 39 92% 7% T 100%
8% 100 626 55 15145 989 0 0 40 92% 6% T 85%
8% 76 520 76 13809 2049 0 0 40 92% 13% T 100%
8% 96 602 63 13637 893 0 0 40 92% 5% T 93%
This message (including any attachments) contains confidential and/or proprietary information intended only for the addressee. Any unauthorized disclosure, copying, distribution or reliance on the contents of this information is strictly prohibited and may constitute a violation of law. If you are not the intended recipient, please notify the sender immediately by responding to this e-mail, and delete the message from your system. If you have any questions about this e-mail please notify the sender immediately.
Ping the GSC, but there is a utility (statit) that will help.
________________________________
From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Page, Jeremy Sent: Wednesday, June 04, 2008 2:32 PM To: toasters@mathworks.com Subject: How to identify a hot disk
So I am getting the following when running sysstat -u. As far as I know nothing should be hitting the array hard enough to drive disk utilization to 100%, much less hold it there. What can I do to further determine what's causing this?
All of my storage is NFS based at this time.
29% 799 44261 1344 14900 54402 0 0 38 93% 46% F 77%
29% 896 43727 1938 14643 54488 0 0 38 93% 43% F 94%
28% 734 43831 1102 13251 54881 0 0 38 93% 44% F 77%
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
31% 876 53275 1354 12054 61719 0 0 39 93% 53% F 80%
13% 264 8377 369 15069 14423 0 0 39 92% 31% 2 94%
8% 107 674 33 13300 5331 0 0 39 92% 30% T 94%
7% 101 601 112 13263 1026 0 0 39 92% 7% T 100%
8% 100 626 55 15145 989 0 0 40 92% 6% T 85%
8% 76 520 76 13809 2049 0 0 40 92% 13% T 100%
8% 96 602 63 13637 893 0 0 40 92% 5% T 93%
This message (including any attachments) contains confidential and/or proprietary information intended only for the addressee. Any unauthorized disclosure, copying, distribution or reliance on the contents of this information is strictly prohibited and may constitute a violation of law. If you are not the intended recipient, please notify the sender immediately by responding to this e-mail, and delete the message from your system. If you have any questions about this e-mail please notify the sender immediately.
Run statit and look for a disk that's doing a lot of iops (xfer's). Netapp is fairly good at distributed load over all the spindles in the volume/aggr so it's typically not just one hot disk. Did you have a full volume/aggr and add just one disk or two?
-Blake
On Wed, Jun 4, 2008 at 11:32 AM, Page, Jeremy jeremy.page@gilbarco.com wrote:
So I am getting the following when running sysstat –u. As far as I know nothing should be hitting the array hard enough to drive disk utilization to 100%, much less hold it there. What can I do to further determine what's causing this?
All of my storage is NFS based at this time.
29% 799 44261 1344 14900 54402 0 0 38 93% 46% F 77%
29% 896 43727 1938 14643 54488 0 0 38 93% 43% F 94%
28% 734 43831 1102 13251 54881 0 0 38 93% 44% F 77%
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
31% 876 53275 1354 12054 61719 0 0 39 93% 53% F 80%
13% 264 8377 369 15069 14423 0 0 39 92% 31% 2 94%
8% 107 674 33 13300 5331 0 0 39 92% 30% T 94%
7% 101 601 112 13263 1026 0 0 39 92% 7% T 100%
8% 100 626 55 15145 989 0 0 40 92% 6% T 85%
8% 76 520 76 13809 2049 0 0 40 92% 13% T 100%
8% 96 602 63 13637 893 0 0 40 92% 5% T 93%
This message (including any attachments) contains confidential and/or proprietary information intended only for the addressee. Any unauthorized disclosure, copying, distribution or reliance on the contents of this information is strictly prohibited and may constitute a violation of law. If you are not the intended recipient, please notify the sender immediately by responding to this e-mail, and delete the message from your system. If you have any questions about this e-mail please notify the sender immediately.
Jeremy,
Sysstat is just showing you the most utilized disk. You need statit to show you the disk utilization of all disks. Here's how:
Toaster>priv set advanced Toaster*>statit -b Toaster*>sysstat -x 1
Wait 30 seconds or so and verify that disk util > 80% most of the time
Toaster*>statit -e
You will get a flood of text (you may need to increase your buffers), scroll up until you see something like this:
disk ut% xfers ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs /sata_aggr0/plex0/rg0: 0f.16 2 1.13 0.18 1.00 21130 0.59 13.49 1467 0.35 8.13 414 0.00 .... . 0.00 .... . 0f.17 2 1.26 0.18 1.00 47250 0.72 11.41 1364 0.35 8.13 369 0.00 .... . 0.00 .... . 0f.18 4 6.00 5.25 1.37 9595 0.46 14.98 1270 0.29 8.37 491 0.00 .... . 0.00 .... . 0f.19 4 5.27 4.72 1.23 10874 0.32 21.44 1308 0.24 9.19 432 0.00 .... . 0.00 .... . 0f.20 4 5.03 4.45 1.15 10307 0.36 18.74 1407 0.22 9.89 440 0.00 .... . 0.00 .... . 0f.21 4 5.22 4.65 1.12 10573 0.37 18.40 1462 0.20 10.73 556 0.00 .... . 0.00 .... . 0f.22 4 5.12 4.55 1.08 11421 0.37 18.33 1505 0.21 10.19 487 0.00 .... . 0.00 .... . 0f.23 4 5.09 4.45 1.13 10442 0.39 17.68 1575 0.25 9.30 466 0.00 .... . 0.00 .... . 0f.24 4 5.19 4.62 1.19 10679 0.34 20.07 1497 0.23 9.53 531 0.00 .... . 0.00 .... . 0f.25 4 5.14 4.48 1.25 10350 0.38 18.00 1624 0.29 8.22 470 0.00 .... . 0.00 .... . 0f.26 4 5.17 4.53 1.19 11904 0.35 19.20 1574 0.29 8.73 610 0.00 .... . 0.00 .... . 0f.27 4 5.43 4.78 1.05 13052 0.33 20.47 1431 0.32 7.40 720 0.00 .... . 0.00 .... .
The first field is the disk and the second field is the disk utilization over the period you ran the statit. If you want more accurate results you can run the statit for longer.
Finally:
Toaster*>priv set ;) Toaster>
Most typically a few hot disks in an aggregate are the outcome of adding only a couple of disks and creating a very small raid group. You should do two things in that scenario:
Add more disks until the raid group is fully populated (and make sure raid size = 14 or 16) Run reallocate on the volumes in the aggregate to spread out the IO against all disks equally
HTH,
Hadrian
From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Page, Jeremy Sent: Wednesday, June 04, 2008 11:32 AM To: toasters@mathworks.com Subject: How to identify a hot disk
So I am getting the following when running sysstat -u. As far as I know nothing should be hitting the array hard enough to drive disk utilization to 100%, much less hold it there. What can I do to further determine what's causing this?
All of my storage is NFS based at this time.
29% 799 44261 1344 14900 54402 0 0 38 93% 46% F 77% 29% 896 43727 1938 14643 54488 0 0 38 93% 43% F 94% 28% 734 43831 1102 13251 54881 0 0 38 93% 44% F 77% CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk ops/s in out read write read write age hit time ty util 31% 876 53275 1354 12054 61719 0 0 39 93% 53% F 80% 13% 264 8377 369 15069 14423 0 0 39 92% 31% 2 94% 8% 107 674 33 13300 5331 0 0 39 92% 30% T 94% 7% 101 601 112 13263 1026 0 0 39 92% 7% T 100% 8% 100 626 55 15145 989 0 0 40 92% 6% T 85% 8% 76 520 76 13809 2049 0 0 40 92% 13% T 100% 8% 96 602 63 13637 893 0 0 40 92% 5% T 93%
This message (including any attachments) contains confidential and/or proprietary information intended only for the addressee. Any unauthorized disclosure, copying, distribution or reliance on the contents of this information is strictly prohibited and may constitute a violation of law. If you are not the intended recipient, please notify the sender immediately by responding to this e-mail, and delete the message from your system. If you have any questions about this e-mail please notify the sender immediately.
You don't really want to run it for longer, you want shorter samples of a time period. The statit output is an average of the lenght that statit ran for, so if you run statit for an hour, you get the average for that hour. I usually take 30 second samples over a period of interesting traffic to figure out what's going on.
-Blake
On Wed, Jun 4, 2008 at 1:20 PM, Hadrian Baron Hadrian.Baron@vegas.com wrote:
Jeremy,
Sysstat is just showing you the most utilized disk. You need statit to show you the disk utilization of all disks. Here's how:
Toaster>priv set advanced
Toaster*>statit –b
Toaster*>sysstat –x 1
Wait 30 seconds or so and verify that disk util > 80% most of the time
Toaster*>statit –e
You will get a flood of text (you may need to increase your buffers), scroll up until you see something like this:
disk ut% xfers ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs
/sata_aggr0/plex0/rg0:
0f.16 2 1.13 0.18 1.00 21130 0.59 13.49 1467 0.35 8.13 414 0.00 .... . 0.00 .... .
0f.17 2 1.26 0.18 1.00 47250 0.72 11.41 1364 0.35 8.13 369 0.00 .... . 0.00 .... .
0f.18 4 6.00 5.25 1.37 9595 0.46 14.98 1270 0.29 8.37 491 0.00 .... . 0.00 .... .
0f.19 4 5.27 4.72 1.23 10874 0.32 21.44 1308 0.24 9.19 432 0.00 .... . 0.00 .... .
0f.20 4 5.03 4.45 1.15 10307 0.36 18.74 1407 0.22 9.89 440 0.00 .... . 0.00 .... .
0f.21 4 5.22 4.65 1.12 10573 0.37 18.40 1462 0.20 10.73 556 0.00 .... . 0.00 .... .
0f.22 4 5.12 4.55 1.08 11421 0.37 18.33 1505 0.21 10.19 487 0.00 .... . 0.00 .... .
0f.23 4 5.09 4.45 1.13 10442 0.39 17.68 1575 0.25 9.30 466 0.00 .... . 0.00 .... .
0f.24 4 5.19 4.62 1.19 10679 0.34 20.07 1497 0.23 9.53 531 0.00 .... . 0.00 .... .
0f.25 4 5.14 4.48 1.25 10350 0.38 18.00 1624 0.29 8.22 470 0.00 .... . 0.00 .... .
0f.26 4 5.17 4.53 1.19 11904 0.35 19.20 1574 0.29 8.73 610 0.00 .... . 0.00 .... .
0f.27 4 5.43 4.78 1.05 13052 0.33 20.47 1431 0.32 7.40 720 0.00 .... . 0.00 .... .
The first field is the disk and the second field is the disk utilization over the period you ran the statit. If you want more accurate results you can run the statit for longer.
Finally:
Toaster*>priv set ;)
Toaster>
Most typically a few hot disks in an aggregate are the outcome of adding only a couple of disks and creating a very small raid group. You should do two things in that scenario:
Add more disks until the raid group is fully populated (and make sure raid size = 14 or 16)
Run reallocate on the volumes in the aggregate to spread out the IO against all disks equally
HTH,
Hadrian
From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Page, Jeremy Sent: Wednesday, June 04, 2008 11:32 AM To: toasters@mathworks.com Subject: How to identify a hot disk
So I am getting the following when running sysstat –u. As far as I know nothing should be hitting the array hard enough to drive disk utilization to 100%, much less hold it there. What can I do to further determine what's causing this?
All of my storage is NFS based at this time.
29% 799 44261 1344 14900 54402 0 0 38 93% 46% F 77%
29% 896 43727 1938 14643 54488 0 0 38 93% 43% F 94%
28% 734 43831 1102 13251 54881 0 0 38 93% 44% F 77%
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
31% 876 53275 1354 12054 61719 0 0 39 93% 53% F 80%
13% 264 8377 369 15069 14423 0 0 39 92% 31% 2 94%
8% 107 674 33 13300 5331 0 0 39 92% 30% T 94%
7% 101 601 112 13263 1026 0 0 39 92% 7% T 100%
8% 100 626 55 15145 989 0 0 40 92% 6% T 85%
8% 76 520 76 13809 2049 0 0 40 92% 13% T 100%
8% 96 602 63 13637 893 0 0 40 92% 5% T 93%
This message (including any attachments) contains confidential and/or proprietary information intended only for the addressee. Any unauthorized disclosure, copying, distribution or reliance on the contents of this information is strictly prohibited and may constitute a violation of law. If you are not the intended recipient, please notify the sender immediately by responding to this e-mail, and delete the message from your system. If you have any questions about this e-mail please notify the sender immediately.