When average disk utilization across all disks in an aggregate becomes high, IO requests are delayed.

My question is what should this threshold be?

I’ve heard from one performance consultant that 40% disk utilization is where you start seeing latency (this was in regards to a HDS array).

The Netapp Storage Performance Management using DFM TR-3525 says 70% is the industry accepted disk utilization threshold.

http://www-download.netapp.com/edm/TOT/docs/3525.pdf (page 17)

Here is some interesting EMC FUD comparing a 3050 and a CX3-40 which says the Netapp beats the CX3 until disk utilization passes 20% then latency increases dramatically (page 7).

http://www.dell.com/downloads/global/products/pvaul/en/netapp_performance_san.pdf (page 7)

Every environment is different, so the proper answer is of course – It depends – and you should test for yourself. It depends on whether you can accept 5ms more latency, or 50ms more. It depends on what kind of IO you do. But my question is what *your* threshold is, and how can you automate the monitoring of disk utilization?

TIA,

Hadrian