Its much appreciated. I'm at my wits end. I've been working with
VMWare/Netapp/Microsoft for quite some time on this issue.
ESX 3.5 Up4
Not always the same VMs, low load vms at that.
Yes it is the 15 min timeout window.
-----Original Message-----
From: Klise, Steve [mailto:klises@pamf.org]
Sent: Wednesday, August 26, 2009 5:50 PM
To: Ken Williams; toasters(a)mathworks.com
Subject: Re: SMVI / VMWare Experiences...
The comment was more of a troubleshooting step. My bad.
I have had problems with "busy" servers. I had to stop the "busy
making" service. For example. I am running dfm 3.8 and I have to stop
the db service before the snapshot. Seemed to band aid the problem.
What version of esx and release r u on?
Is it always the same vm's that fail?
Is it the 15 minute timeout during the snapshot?
----- Original Message -----
From: Ken Williams <kwillia(a)smud.org>
To: Klise, Steve; toasters(a)mathworks.com <toasters(a)mathworks.com>
Sent: Wed Aug 26 17:18:12 2009
Subject: RE: SMVI / VMWare Experiences...
Thank you for the input.
I disagree with the "If you can do a VM snapshot, then its an issue with
SMVI." statement. VM Snapshots do not do the same functions as a SMVI
snapshot call to the ESX API (As per VMWare Technical Support). This is
definably a communication between VSS/GuestOS/ESX Host issue. Or some
greater misconfiguration...
-----Original Message-----
From: Klise, Steve [mailto:klises@pamf.org]
Sent: Wednesday, August 26, 2009 3:09 PM
To: Ken Williams; toasters(a)mathworks.com
Subject: RE: SMVI / VMWare Experiences...
Couple things you have hit on, but I will regurgitate,
*
Make sure you have the latest tools installed WITH THE VSS
OPTION. A reboot is required
*
check for any SMVI snapshots. We run a morning monitoring
report that has this. Its great and anyone running ESX should use it.
*
I have had issues with timeouts. If you can do a VM snapshot,
then its an issue with SMVI. If you can't you need to start there.
*
I have seen issues with older 2.5.x and 3.x that neededt the
hardware upgraded on the VM.
*
check disk timeouts
here were a couple of other things I ran across:
Solution
SnapManager for VI utilizes an internal database to keep track of these
locks and provides persistence across reboots. Simply rebooting the
SnapManager for VI host will not clear these locks.
If you want to remove all currently running tasks in SMVI, perform the
following:
1. Stop SnapManager for VI service.
2. Remove the <SMVI dir>/server/crashdb directory.
3. Start SnapManager for VI service.
Performing these steps will not affect the scheduled jobs nor remove
them from the interface. It will kill and remove any outstanding or in
process tasks
________________________________
From: owner-toasters(a)mathworks.com on behalf of Ken Williams
Sent: Wed 8/26/2009 2:32 PM
To: toasters(a)mathworks.com
Subject: SMVI / VMWare Experiences...
I'm looking for some experiences people out there may have with SMVI
with NetApp. We're currently experiencing major issues with SMVI
snapshots failing. I've had open tickets with NetApp/VMWare/Microsoft
for 3 months and still have yet to have a solution.
My environment looks like such:
* 6 x HP DL380 G5 (32gb Ram) in a ESX Cluster
* Dual Emulex 10000 Cards in each host.
* Cisco MDS SAN
* Netapp FAS3070 Cluster ~9tb aggregate for VMWare.
* VMFS Datastores ~10-15 VMs per datastore. ~50gb per VM.
* ASIS Turned on
* Volume and LUNspace reservation turned off
* OnTap 7.2.5.1
* Windows 2003 Guest OS.
I cant see us reaching any limitation on the Filers or the SAN. Yet we
have random VMs failing snapshots every night. Are other people seeing
these issues? (I've gone through the gamut of troubleshooting, version
management of ESX/VMWareTools/etc). Snapshots timeout and fail at the
VMWare/Guest level, not at the Netapp snapshot level.
We want to have SMVI function with VSS enabled.
Has anyone had failing snapshots been able to resolve a similar issue?
Or does anyone have SMVI working properly that we could use as a
reference to compare configuration?
__________________________________________________________
Ken Williams
Storage Administrator, Business Technology Operations Sacramento
Municipal Utility District
E-Mail: kwillia(a)smud.org
Phone: (916) 732-6744
Cell: (916) 240-4213