I'm looking for some experiences people out there may have with SMVI with NetApp. We're currently experiencing major issues with SMVI snapshots failing. I've had open tickets with NetApp/VMWare/Microsoft for 3 months and still have yet to have a solution.
My environment looks like such: * 6 x HP DL380 G5 (32gb Ram) in a ESX Cluster * Dual Emulex 10000 Cards in each host. * Cisco MDS SAN * Netapp FAS3070 Cluster ~9tb aggregate for VMWare. * VMFS Datastores ~10-15 VMs per datastore. ~50gb per VM. * ASIS Turned on * Volume and LUNspace reservation turned off * OnTap 7.2.5.1 * Windows 2003 Guest OS.
I cant see us reaching any limitation on the Filers or the SAN. Yet we have random VMs failing snapshots every night. Are other people seeing these issues? (I've gone through the gamut of troubleshooting, version management of ESX/VMWareTools/etc). Snapshots timeout and fail at the VMWare/Guest level, not at the Netapp snapshot level.
We want to have SMVI function with VSS enabled.
Has anyone had failing snapshots been able to resolve a similar issue? Or does anyone have SMVI working properly that we could use as a reference to compare configuration?
__________________________________________________________ Ken Williams Storage Administrator, Business Technology Operations Sacramento Municipal Utility District E-Mail: kwillia@smud.org Phone: (916) 732-6744 Cell: (916) 240-4213
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Ken Williams wrote:
I'm looking for some experiences people out there may have with SMVI with NetApp. We're currently experiencing major issues with SMVI snapshots failing. I've had open tickets with NetApp/VMWare/Microsoft for 3 months and still have yet to have a solution.
My environment looks like such:
* 6 x HP DL380 G5 (32gb Ram) in a ESX Cluster * Dual Emulex 10000 Cards in each host. * Cisco MDS SAN * Netapp FAS3070 Cluster ~9tb aggregate for VMWare. * VMFS Datastores ~10-15 VMs per datastore. ~50gb per VM. * ASIS Turned on * Volume and LUNspace reservation turned off * OnTap 7.2.5.1 * Windows 2003 Guest OS.
I cant see us reaching any limitation on the Filers or the SAN. Yet we have random VMs failing snapshots every night. Are other people seeing these issues? (I've gone through the gamut of troubleshooting, version management of ESX/VMWareTools/etc). Snapshots timeout and fail at the VMWare/Guest level, not at the Netapp snapshot level.
We want to have SMVI function with VSS enabled.
Has anyone had failing snapshots been able to resolve a similar issue? Or does anyone have SMVI working properly that we could use as a reference to compare configuration?
Ken --
We too are experiencing issues with SMVI 1.2 bombing out when attempting to perform a VMware quiesce snap on _some_ RHEL5.3 32-bit VMs. A couple of notables:
- - These problematic VMs have a 100% success rate at taking VMware quiesce snaps within vCenter, independent of SMVI. - - The problem is 100% reproducible during night, day, etc. - - We will deploy several VMs at a crack, all the same build. When the next SMVI schedule hits, some fail while others succeed. Bizarre. - - Over time (weve been experiencing this issue for several weeks now), the 'problem' VMs change. Example: VMs abc and xyz will fail for days; without intervention, VM abc will stop failing while VM xyz continues to fail ... even if theyre part of the same deploy base template/kickstart. - - We are nowhere near our snap limit on the volumes. - - These problematic VMs only bomb when attempting a quiesce. Non-quiesce SMVI snaps work like a champ.
Been working with NetApp and VMware for some time now. Were at ESX 3.5u4+ to an 3160-R5 @ 7.2.6.1P3 via NFS + vCenter 4.0 + synch SnapMirror to another 3160-R5 @ 7.2.6.1P3. The only thing revealing is SMVI + vCenter logs of "cannot create a quiesced snapshot because the (user-supplied) custom prefreeze script in the virtual machine exited with a nonzero return code".
- -- Nick