On 2020-11-25 09:32, Heino Walther wrote:
Hi there We have a system that needs 100k+ IOPS but for some strange reason we cannot get past about 70k IOPS.
That doesn't really mean anything. What about the rest of the essential workload parameters this is about? Is it a sort of micro-benchmark perhaps, only counting the no of IOPS per se? That's pretty much inane w.r.t. "need".
To paraphrase Jeff Steiner: "What do you really need?"
(I'm not saying you don't know what you're doing here, you just haven't written anything that points towards that you do.)
This test server(s) is a Windows VM on ESXi, the ESXi host has created a VMFS as Datastore and put a VM image there. But what does the VM do? Run some sort of micro-benchmark tool? If yes: Which IO Size is this? At which latency is 100k IOPS acceptable *and* needed, i.e. 100k IOPS @ x ms, what's x?
On 2020-11-25 09:32, Heino Walther wrote:
We have tried to create two LUNs on separate volumes, and we can pull out 70k on each, at the same time... which seems to point towards the concurrency...
Yeah, maybe some or other limitation w.r.t. concurrency in the ESXi hosts then? The Q is where is your bottleneck and in what way is that interesting at all in the context of 100k IOPS (again: which IOPS exactly...)
Heino Walther wrote:
The clusters we have tested up against are an A300 with 24 x 7TB SSD. And an A700 with a mixed setup of shelfs (two disk loops with quad cabling). But even up against the A700 we cannot seem to get past the 70k IOPS.
That could also indicate that your bottleneck is not inside the AFF/ONTAP.
Now we have read about the concurrency on a NetApp system, where each volume is assigned a CPU core, which might be our problem..
Your single test LUN (FC-AL?) is in fact inside a FlexVol in there, that's true. No, each FlexVol is not assigned a CPU core in WAFL, it doesn't work like that. There can be bottlenecking inside WAFL yes, but it's really complex and only happens as sort of anomalies and that's not the situation here at all. This is 100% Green Field IIUYC; in this case you have a totally empty AFF, right? Nothing else is there. I'm not aware of a limitation in bandwidth or IOPS (R or W) which is coupled to the FlexVolä's processing mechanics inside ONTAP. It has to be something else that limits you in this particular case to 70k IOPS per VM. Because if you have two VMs, you immediately get 70+70 = 140k, right?
Regards, /M