Hi there
We have a system that needs 100k+ IOPS but for some strange reason we cannot get past about 70k IOPS. The server is Windows VM on ESXi with two FC HBAs running 8G. (we have adjusted queue depth etc.) The server is connected to the NetApp via two Brocade 6510 switches. The clusters we have tested up against are an A300 with 24 x 7TB SSD. And an A700 with a mixed setup of shelfs (two disk loops with quad cabling). But even up against the A700 we cannot seem to get past the 70k IOPS.
Now we have read about the concurrency on a NetApp system, where each volume is assigned a CPU core, which might be our problem.. https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP...
Our storage setup is a volume with a LUN which is presented to the host, which then creates a VMFS as a datastore… We have tried to create two LUNs on separate volumes, and we can pull out 70k on each, at the same time… which seems to point towards the concurrency… But is there a way around this? We would rather not have multiple LUNs, but rather one LUN.
Any input would be helpful 😊
/Heino
"Heino" == Heino Walther hw@beardmann.dk writes:
Heino> We have a system that needs 100k+ IOPS but for some strange Heino> reason we cannot get past about 70k IOPS.
I wish I had these types of problems!
Heino> The server is Windows VM on ESXi with two FC HBAs running 8G. Heino> (we have adjusted queue depth etc.)
What kind of ESXi hardware are you running on?
Heino> The server is connected to the NetApp via two Brocade 6510 switches.
Are the links to the A300/700 also 8gb?
Heino> The clusters we have tested up against are an A300 with 24 x Heino> 7TB SSD. And an A700 with a mixed setup of shelfs (two disk Heino> loops with quad cabling). But even up against the A700 we Heino> cannot seem to get past the 70k IOPS.
Heino> Now we have read about the concurrency on a NetApp system, Heino> where each volume is assigned a CPU core, which might be our Heino> problem..
Heino> https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP...
Heino> Our storage setup is a volume with a LUN which is presented to Heino> the host, which then creates a VMFS as a datastore…
Why not just export an NFS volumes instead? Then you get the advantage of being able to grow/shrink your datastore(s) as needed.
And could you export multiple disks from multiple data stores to windows and then use RAID0 to join them into one big volume for your application?
Heino> We have tried to create two LUNs on separate volumes, and we Heino> can pull out 70k on each, at the same time… which seems to Heino> point towards the concurrency…
Heino> But is there a way around this? We would rather not have Heino> multiple LUNs, but rather one LUN.
You're way out of my league performance wise... *grin* so I don't have anything useful to give in terms of tuning, I'm just wondering if you can turn this problem on it's head somehow.
And honestly, if you need this many IOPS, would it make sense to look more closely at ways to break it up, or to maybe use RAM disks and flush larger blocks to the Netapp?
Good luck! John
On 2020-11-25 09:32, Heino Walther wrote:
Hi there We have a system that needs 100k+ IOPS but for some strange reason we cannot get past about 70k IOPS.
That doesn't really mean anything. What about the rest of the essential workload parameters this is about? Is it a sort of micro-benchmark perhaps, only counting the no of IOPS per se? That's pretty much inane w.r.t. "need".
To paraphrase Jeff Steiner: "What do you really need?"
(I'm not saying you don't know what you're doing here, you just haven't written anything that points towards that you do.)
This test server(s) is a Windows VM on ESXi, the ESXi host has created a VMFS as Datastore and put a VM image there. But what does the VM do? Run some sort of micro-benchmark tool? If yes: Which IO Size is this? At which latency is 100k IOPS acceptable *and* needed, i.e. 100k IOPS @ x ms, what's x?
On 2020-11-25 09:32, Heino Walther wrote:
We have tried to create two LUNs on separate volumes, and we can pull out 70k on each, at the same time... which seems to point towards the concurrency...
Yeah, maybe some or other limitation w.r.t. concurrency in the ESXi hosts then? The Q is where is your bottleneck and in what way is that interesting at all in the context of 100k IOPS (again: which IOPS exactly...)
Heino Walther wrote:
The clusters we have tested up against are an A300 with 24 x 7TB SSD. And an A700 with a mixed setup of shelfs (two disk loops with quad cabling). But even up against the A700 we cannot seem to get past the 70k IOPS.
That could also indicate that your bottleneck is not inside the AFF/ONTAP.
Now we have read about the concurrency on a NetApp system, where each volume is assigned a CPU core, which might be our problem..
Your single test LUN (FC-AL?) is in fact inside a FlexVol in there, that's true. No, each FlexVol is not assigned a CPU core in WAFL, it doesn't work like that. There can be bottlenecking inside WAFL yes, but it's really complex and only happens as sort of anomalies and that's not the situation here at all. This is 100% Green Field IIUYC; in this case you have a totally empty AFF, right? Nothing else is there. I'm not aware of a limitation in bandwidth or IOPS (R or W) which is coupled to the FlexVolä's processing mechanics inside ONTAP. It has to be something else that limits you in this particular case to 70k IOPS per VM. Because if you have two VMs, you immediately get 70+70 = 140k, right?
Regards, /M