Thought about that, but we aren’t taking vmsnaps (yet).  We DID run into that bug, however – caused split brain on the ESX clusters during an HA event.  Bad stuff.

 


From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Karlsson Ulf Ibrahim :ULK
Sent: Tuesday, November 04, 2008 6:23 AM
To: toasters@mathworks.com
Subject: SV: Brief outages on the filer?

 

Maybe this from http://media.netapp.com/documents/tr-3428.pdf (Netapp+VMware storage best practices)

 

When using VMware snapshots (VMsnaps) with NFS Datastores a condition exists where I/O to the VM is suspended while VMsnaps are being deleted (or more technically speaking the process of committing the redo logs to the VMDK files occur). This issue is experienced with any VMware technology that leverages VMsnaps such as VMware Consolidated Backup, Storage VMotion, Scalable Virtual Images, etc. and SnapManager for Virtual Infrastructure from NetApp. VMware has identified this behavior as a bug (SR195302591), and has released patch ESX350-200808401-BG which addresses this bug. At present time, this patch applies to ESX version 3.5, updates 1 and 2 only. If you plan on leveraging any of the applications that require the VMsnap process please apply this patch, and complete its installation requirements, on the ESX servers in your environment. If you are using scripts in order to take on disk snapshot backups and are unable to upgrade your systems, then VMware and NetApp recommend that you discontinue the use of the VMsnap process prior to executing the NetApp snapshot.

/Uffe

 

-----Ursprungligt meddelande-----
Från: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] För Glenn Walker
Skickat: den 3 november 2008 20:03
Till: Page, Jeremy; toasters@mathworks.com
Ämne: RE: Brief outages on the filer?

Any way you can predict when it will happen?  Sysstat (or better yet, perfstat) would be of help here.

 

Something I’ve noticed on my infrastructure:  VMWare over NFS (unsure about other protocols) will have huge spikes where they write lots of data in a quick burst – happens only a few times a day on relatively quiet systems, but I can definitely see a spike on the filer.  Perhaps you have the same thing going, just a SWAG…

 

The impact on our side is not really felt – but the filer does go into back2back CPs from the massive spike (200MB/s – 350MB/s in a short window) and that could manifest itself as ‘poor disk response time’.

 

In our case, we’re running VMWare over NFS and Exchange over iSCSI on the same filers, but no one is really complaining when the ‘events’ happen.  Just something I’ve noticed for a while.

 

FAS6070 and the busy time is recorded around 6000 NFS IOPS.  That said, we did a stress test with about 25 guests running IOMeter and were able to push 15000 NFS OPS on node 1, 10000 NFS OPS on node 2 (a combined 400MB/s write, 300MB/s read) without any sort of reported performance problems.

 


From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Page, Jeremy
Sent: Monday, November 03, 2008 11:02 AM
To: toasters@mathworks.com
Subject: Brief outages on the filer?

 

I am seeing brief outages where my VMs (NFS as the back end protocol) and SQL LUNs (FC) both complain of poor disk response time at the same time. I don’t think it can be the infrastructure since one is IP and the other FC. The LUNs are on a different set of spindles/different aggr then the NFS volumes as well, so I don’t think it’s a disk bottleneck. I’m on a 3070 and rarely do we hit 3500 IOPS (and 90+% of that out of cache) or go above 40% for the busiest CPU (normally we’re in the 15-25% range) so I am not sure what’s going on here, any suggestions on how to troubleshoot it?

 

We’re running 7.2.4, I want to wait for 7.3.1 to upgrade since we are using NFS for VMware and there are several fixes that will be beneficial to us.

Please be advised that this email may contain confidential information.
If you are not the intended recipient, please do not read, copy or
re-transmit this email. If you have received this email in error,
please notify us by email by replying to the sender and by telephone
(call us collect at +1 202-828-0850) and delete this message and any
attachments. Thank you in advance for your cooperation and assistance.

In addition, Danaher and its subsidiaries disclaim that the content of
this email constitutes an offer to enter into, or the acceptance of,
any
contract or agreement or any amendment thereto; provided that the
foregoing disclaimer does not invalidate the binding effect of any
digital or other electronic reproduction of a manual signature that is
included in any attachment to this email.