we see this alot on one of our 3050's, the disks are 144gb 10k FCAL, we are very concerned:
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk ops/s in out read write read write age hit time ty util 33% 1633 14262 1967 2934 5948 0 0 5 99% 33% : 100% 40% 2036 13033 1799 12622 5279 0 0 5 99% 27% D 100% 58% 3800 16660 2534 13262 24247 0 0 5 99% 100% : 100% 40% 3271 17156 2268 10978 18927 0 0 5 100% 100% : 100% 30% 1217 9699 1802 7390 24154 0 0 5 99% 100% : 100% 22% 2318 571 2020 8697 21873 0 0 5 93% 100% : 100% 20% 1428 706 2014 7663 21820 0 0 5 94% 100% : 100% 63% 2728 57180 3438 7852 19920 0 0 5 100% 100% : 100% 76% 2787 30640 17043 44909 38679 0 0 5 99% 77% D 100% 72% 1995 20426 20208 41305 35765 0 0 5 98% 100% : 100% 68% 2049 19219 22977 43481 36810 0 0 5 98% 100% : 100% 66% 2234 23097 20839 39816 38096 0 0 5 98% 100% : 100% 63% 2352 23897 27837 35960 16489 0 0 5 99% 55% : 100% 54% 2478 23635 30499 36404 0 0 0 5 99% 0% - 100% 64% 1644 20728 7715 33592 34764 0 0 5 99% 78% D 100% 46% 2901 15620 2120 15486 35537 0 0 5 99% 100% : 100% 42% 3209 13807 1929 18263 34560 0 0 5 99% 100% : 100% 39% 1118 15173 2115 17007 34335 0 0 5 99% 100% : 100% 37% 1111 15072 1830 3434 10154 0 0 5 99% 43% : 100% 24% 1215 14888 1678 1331 0 0 0 5 99% 0% - 100%
what could be causing this and how can we alleviate it?
I'm assuming this is "sysstat" output. I believe "sysstat" shows the utilization of the busiest disk. So, the next thing you need to find out is whether all of your disks are busy or just a few. You might want to run "statit" and see how evenly distributed the disk utilization is. If it's not evenly distributed, you would want to consider running "reallocate" to defragment and optimize the wafl layout. Since running "reallocate" actively moves blocks around it involves a slight performance hit, so you might want to kick it off during a weekend or off hours.
If you do find all of the disks are busy, then you'd probably need more detail about how your raid groups are layed out etc. to try to debug what is going on. Probably worth opening a case with Netapp and getting them to walk you through collecting perfstat data and having them analyze that.
Steve Brown
No More Linux! wrote:
we see this alot on one of our 3050's, the disks are 144gb 10k FCAL, we are very concerned:
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk ops/s in out read write read write age hit time ty util 33% 1633 14262 1967 2934 5948 0 0 5 99% 33% : 100% 40% 2036 13033 1799 12622 5279 0 0 5 99% 27% D 100% 58% 3800 16660 2534 13262 24247 0 0 5 99% 100% : 100% 40% 3271 17156 2268 10978 18927 0 0 5 100% 100% : 100% 30% 1217 9699 1802 7390 24154 0 0 5 99% 100% : 100% 22% 2318 571 2020 8697 21873 0 0 5 93% 100% : 100% 20% 1428 706 2014 7663 21820 0 0 5 94% 100% : 100% 63% 2728 57180 3438 7852 19920 0 0 5 100% 100% : 100% 76% 2787 30640 17043 44909 38679 0 0 5 99% 77% D 100% 72% 1995 20426 20208 41305 35765 0 0 5 98% 100% : 100% 68% 2049 19219 22977 43481 36810 0 0 5 98% 100% : 100% 66% 2234 23097 20839 39816 38096 0 0 5 98% 100% : 100% 63% 2352 23897 27837 35960 16489 0 0 5 99% 55% : 100% 54% 2478 23635 30499 36404 0 0 0 5 99% 0% - 100% 64% 1644 20728 7715 33592 34764 0 0 5 99% 78% D 100% 46% 2901 15620 2120 15486 35537 0 0 5 99% 100% : 100% 42% 3209 13807 1929 18263 34560 0 0 5 99% 100% : 100% 39% 1118 15173 2115 17007 34335 0 0 5 99% 100% : 100% 37% 1111 15072 1830 3434 10154 0 0 5 99% 43% : 100% 24% 1215 14888 1678 1331 0 0 0 5 99% 0% - 100%
what could be causing this and how can we alleviate it?
You don't say how many spindles you have, but if I had to guess, I would say that your write activity is pretty high, and WAFL is having a bit of trouble finding places to put the data. Increasing the number of spindles in the active aggregate(s) would help.
No More Linux! wrote:
we see this alot on one of our 3050's, the disks are 144gb 10k FCAL, we are very concerned:
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk ops/s in out read write read write age hit time ty util 33% 1633 14262 1967 2934 5948 0 0 5 99% 33% : 100% 40% 2036 13033 1799 12622 5279 0 0 5 99% 27% D 100% 58% 3800 16660 2534 13262 24247 0 0 5 99% 100% : 100% 40% 3271 17156 2268 10978 18927 0 0 5 100% 100% : 100% 30% 1217 9699 1802 7390 24154 0 0 5 99% 100% : 100% 22% 2318 571 2020 8697 21873 0 0 5 93% 100% : 100% 20% 1428 706 2014 7663 21820 0 0 5 94% 100% : 100% 63% 2728 57180 3438 7852 19920 0 0 5 100% 100% : 100% 76% 2787 30640 17043 44909 38679 0 0 5 99% 77% D 100% 72% 1995 20426 20208 41305 35765 0 0 5 98% 100% : 100% 68% 2049 19219 22977 43481 36810 0 0 5 98% 100% : 100% 66% 2234 23097 20839 39816 38096 0 0 5 98% 100% : 100% 63% 2352 23897 27837 35960 16489 0 0 5 99% 55% : 100% 54% 2478 23635 30499 36404 0 0 0 5 99% 0% - 100% 64% 1644 20728 7715 33592 34764 0 0 5 99% 78% D 100% 46% 2901 15620 2120 15486 35537 0 0 5 99% 100% : 100% 42% 3209 13807 1929 18263 34560 0 0 5 99% 100% : 100% 39% 1118 15173 2115 17007 34335 0 0 5 99% 100% : 100% 37% 1111 15072 1830 3434 10154 0 0 5 99% 43% : 100% 24% 1215 14888 1678 1331 0 0 0 5 99% 0% - 100%
what could be causing this and how can we alleviate it?
hi,
i have aggregate with 5 raid groups of 16 drives each total aggregate size of over 8TB
On 10/23/07, Andrew Siegel abs@blueskystudios.com wrote:
You don't say how many spindles you have, but if I had to guess, I would say that your write activity is pretty high, and WAFL is having a bit of trouble finding places to put the data. Increasing the number of spindles in the active aggregate(s) would help.
No More Linux! wrote:
we see this alot on one of our 3050's, the disks are 144gb 10k FCAL, we are very concerned:
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk ops/s in out read write read write age hit time ty util 33% 1633 14262 1967 2934 5948 0 0 5 99% 33% : 100% 40% 2036 13033 1799 12622 5279 0 0 5 99% 27% D 100% 58% 3800 16660 2534 13262 24247 0 0 5 99% 100% : 100% 40% 3271 17156 2268 10978 18927 0 0 5 100% 100% : 100% 30% 1217 9699 1802 7390 24154 0 0 5 99% 100% : 100% 22% 2318 571 2020 8697 21873 0 0 5 93% 100% : 100% 20% 1428 706 2014 7663 21820 0 0 5 94% 100% : 100% 63% 2728 57180 3438 7852 19920 0 0 5 100% 100% : 100% 76% 2787 30640 17043 44909 38679 0 0 5 99% 77% D 100% 72% 1995 20426 20208 41305 35765 0 0 5 98% 100% : 100% 68% 2049 19219 22977 43481 36810 0 0 5 98% 100% : 100% 66% 2234 23097 20839 39816 38096 0 0 5 98% 100% : 100% 63% 2352 23897 27837 35960 16489 0 0 5 99% 55% : 100% 54% 2478 23635 30499 36404 0 0 0 5 99% 0% - 100% 64% 1644 20728 7715 33592 34764 0 0 5 99% 78% D 100% 46% 2901 15620 2120 15486 35537 0 0 5 99% 100% : 100% 42% 3209 13807 1929 18263 34560 0 0 5 99% 100% : 100% 39% 1118 15173 2115 17007 34335 0 0 5 99% 100% : 100% 37% 1111 15072 1830 3434 10154 0 0 5 99% 43% : 100% 24% 1215 14888 1678 1331 0 0 0 5 99% 0% - 100%
what could be causing this and how can we alleviate it?
On 10/23/07, No More Linux! no.more.linux@gmail.com wrote:
i have aggregate with 5 raid groups of 16 drives each total aggregate size of over 8TB
we see this alot on one of our 3050's, the disks are 144gb 10k FCAL, we are very concerned:
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk ops/s in out read write read write age hit time ty util 33% 1633 14262 1967 2934 5948 0 0 5 99% 33% : 100% 40% 2036 13033 1799 12622 5279 0 0 5 99% 27% D 100% 58% 3800 16660 2534 13262 24247 0 0 5 99% 100% : 100%
what could be causing this and how can we alleviate it?
are you by chance adding new disks one at a time?
...lorib
we have not added any disks in quite some time and when we do we always add an entire 16gb raid group at a time, never single disks since it is against netapp policy.
aggregate is 97% full but only with about 8 volumes the least amount of any of our filer. i will try see which volume seems to be most busy/writing
all shelve are on one loop we have no more loop remain to split
On 10/23/07, Lori Barfield itdirector@gmail.com wrote:
On 10/23/07, No More Linux! no.more.linux@gmail.com wrote:
i have aggregate with 5 raid groups of 16 drives each total aggregate size of over 8TB
we see this alot on one of our 3050's, the disks are 144gb 10k FCAL, we are very concerned:
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk ops/s in out read write read write age hit time ty util 33% 1633 14262 1967 2934 5948 0 0 5 99% 33% : 100% 40% 2036 13033 1799 12622 5279 0 0 5 99% 27% D 100% 58% 3800 16660 2534 13262 24247 0 0 5 99% 100% : 100%
what could be causing this and how can we alleviate it?
are you by chance adding new disks one at a time?
...lorib
How full is your aggregate and/or volumes? Your CP types are "D", meaning low datavecs, take a look on Now for explanations of that. Definitely looks write-related.
Glenn
________________________________
From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of No More Linux! Sent: Monday, October 22, 2007 4:33 PM To: toasters@mathworks.com Subject: disk busy 100%
we see this alot on one of our 3050's, the disks are 144gb 10k FCAL, we are very concerned:
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk ops/s in out read write read write age hit time ty util 33% 1633 14262 1967 2934 5948 0 0 5 99% 33% : 100% 40% 2036 13033 1799 12622 5279 0 0 5 99% 27% D 100% 58% 3800 16660 2534 13262 24247 0 0 5 99% 100% : 100% 40% 3271 17156 2268 10978 18927 0 0 5 100% 100% : 100% 30% 1217 9699 1802 7390 24154 0 0 5 99% 100% : 100% 22% 2318 571 2020 8697 21873 0 0 5 93% 100% : 100% 20% 1428 706 2014 7663 21820 0 0 5 94% 100% : 100% 63% 2728 57180 3438 7852 19920 0 0 5 100% 100% : 100% 76% 2787 30640 17043 44909 38679 0 0 5 99% 77% D 100% 72% 1995 20426 20208 41305 35765 0 0 5 98% 100% : 100% 68% 2049 19219 22977 43481 36810 0 0 5 98% 100% : 100% 66% 2234 23097 20839 39816 38096 0 0 5 98% 100% : 100% 63% 2352 23897 27837 35960 16489 0 0 5 99% 55% : 100% 54% 2478 23635 30499 36404 0 0 0 5 99% 0% - 100% 64% 1644 20728 7715 33592 34764 0 0 5 99% 78% D 100% 46% 2901 15620 2120 15486 35537 0 0 5 99% 100% : 100% 42% 3209 13807 1929 18263 34560 0 0 5 99% 100% : 100% 39% 1118 15173 2115 17007 34335 0 0 5 99% 100% : 100% 37% 1111 15072 1830 3434 10154 0 0 5 99% 43% : 100% 24% 1215 14888 1678 1331 0 0 0 5 99% 0% - 100%
what could be causing this and how can we alleviate it?