Rick Rhodes wrote:
I'm curious just what the latency and queue length are for a lun in a "lun stats" cmd. Is it the latency of servicing a host I/O, or, of writing the I/O's to disk?
Well, it should be the same semantics as for a physical disk pretty much. It also has an avg latency (for any time period you measure when it serves I/O at all) and a queue length, and a max queue depth. The whole idea is that a LUN should logically behave like a physical disk device from an OS PoV, so...
I assume the LUN latency is measured from the point where the ops (READ or WRITE) enter some place inside ONTAP, the latency is the time it takes for the LUN to respond: 'ok, commited to stable storage'. There's lots of stuff, layers of abstraction, beneath the LUN of course -- basically it's just a big file in a FlexVol (WAFL).
And the queue length is the number of in-flight commands on the LUN, ops sent down to it but the response hasn't come back yet. How many ups "in parallel" (whatever that means for a LUN) an ONTAP LUN can do, I don't know
If that parallelism is >1 (it isn't for a physical disk; the write head can only be at one particular place at a time, right?) I'm wrong about this:
If that qlen was 1 instead of 4 you'd be at ~12 ms avg latency on those LUNs instead of 50.
Rick Rhodes wrote:
In other words, I/O's hit the head and get stored in memory (mirrored to the partner head, logged to NVRAM), then are written to disk curing a CP.
What I/Os are you talking about now? An I/O hitting a LUN, or physical disk I/O's underneath all those layers of abstraction, deep inside ONTAP?
I could see the queue length being a problem if the host is waiting for the disk writes. I just can't see how it would take +12ms x 4 = 50ms to write to the head memory. It's almost Like the head is single threading writes from the Host straight through to the backend. Other than back-to-back CP's, I don't understand how this could occur.
Me neither. And I'm prob wrong about that whole thing, as I don't really know how a LUN defines (measures) the latency and queue lenght. A LUN is essentially a special kind of file in a WAFL, so in some respect it must be basically the same as when you write a block in a file then
I was thinking a reallocate, but since it's only writes I don't see how that will help. The aggregate is 9tb of which only 3 are used, leaving 6tb free. I don't see how it's a freespace problem. It has 22x600gb15k drives in 2x11 disk raiddp raidsets.
No, I agree... You've got *plenty* of free space as well. And 15K rpm drives as well. And the box isn't high on CPU load in any way at all that I've seen from what you've shown. Well, I don't know. Something fishy here. Your sysstat output does show high disk utilisation, that's clear. It's 40-50% and that's not very pleasant.
This Aggregate is 2 raid groups, with rgsize 11, for a total of 22 spindles. OK they're 15K rpm but still that's not a lot of spindles... What FAS model and ONTAP version is this? *curious*
/M