Hi there

 

I have a setup with a two node Fabric MetroCluster with A300 nodes and four Brocade 6510 switches places about 10KM from each other.

Our ESXi 7.0 hosts connects via 16G FC using another four frontend Brocade 6510 switches.

Our ESXi hosts can see four paths for each LUN they are presented from the SVMs.

ESXi show all four paths as Active (I/O) which I find a bit odd, because two of the paths are remote to the ESXi hosts…

Both the ESXi and the igroup on the NetApp is configured for ALUA… but I have googled that there can be issues with this, and maybe we have those issues…

 

The main issue is that if we do a few performance tests across different datastores (local and remote to the ESXi host), we see OK performance for the local datastores (800+MB/sec.) but if we try to test against a datastore that is remote to the ESXi host we see 50-60MB/sec. which is a huge difference and leads us to question the setup…

 

We are aware that especially writing to a remote datastore will involve a transfer to the remote controller (10KM) this controller then has to write this to it’s peer (remote) controller (10KM) the remote peer than has to send an ack. back to the peer (10KM) which then sends an ack. to the ESXi host (10KM) so all in all 40KM plus waiting for the systems…  it is not ideal, but is this kind of performance normal?

The two nodes does have an ethernet based cluster interconnect which is currently linked at 1Gb, and I am beginning to suspect that the data between the two nodes is going via this link?  But the more I think about it, the more it does not make sense?

 

Our FC interswitch links are running 8Gb, but on the switches we see nothing near saturation of any ports… and we of cause also checked for port errors of any kind…

 

If anyone has a similar setup, any help would be great… we are doing a few more tests, but we are close to opening a case with NetApp…

 

/B