Solved (kind of) -
It looks like disabling the IPSec on the router interfaces and running the connection unencrypted between the data-centers fixed the problem.
Further analysis revealed all the mirrors had connection issues - the ones that carried on working just had much smaller amounts of data to update so they eventually succeeded whereas the large baseline transfers for the new volumes just wouldn't complete.
So my question then becomes -
Is there anything on the Cisco IPSec config or NetApp OnTap config that can be tweaked to ensure trouble-free mirroring over a secured link ?
Cheers, Raj.
On Sat, May 3, 2008 at 8:59 PM, Raj Patel phigmov@gmail.com wrote:
We've got two FAS 270's in different cities. They're connected by a 10mb pipe with routers (running ipsec) & firewalls (checkpoint splat) seperating each datacenter.
The primary san is fine and runs all our prod volumes (7.0.5) which are mirrored to our secondary san (7.0.6).
Recently I had to recreate the mirror relationship for some volumes as they'd fallen far out of sync due to some firewall work.
What I am seeing is one volume is syncing fine, one has a small lag and two are stuck with a status of 'Pending with restart checkpoint' after I re-initialised the transfer.
snapmirror status -l shows this for one of the two that just don't get properly initialised
Source: 10.1.45.7:sqlprod01 Destination: adcsan1:sqlprod01_mirror Status: Pending with restart checkpoint Progress: 38376 KB State: Unknown Lag: - Mirror Timestamp: - Base Snapshot: - Current Transfer Type: Retry Current Transfer Error: volume is not online; cannot execute operation Contents: - Last Transfer Type: - Last Transfer Size: - Last Transfer Duration: - Last Transfer From: -
Our firewalls rules have been relaxed to allow free-flow between these devices (instead of just the SnapMirror ports) and the routers and circuit haven't changed at all between it working fine and not working now. The volume that is mirroring OK seems fine and still syncs fine - granted the updates are small whereas the three non-working volumes have to sync quite a lot of data.
I've tried deleting the mirrored volumes, recreating them, setting up the mirror relationship again (with a variety of scheduling and bandwidth throttling options) and doing a destination SAN reboot.
What are the best options to troubleshoot this or insuring a successful mirror ? Has anyone had issues with dropped or stalled SnapMirror baseline transfers via an IPSec tunnel or Firewall ?
Thanks in advance, Raj.
PS As an addendum it looks like it starts a transfer, stalls and from then on subsequent mirrors fail because its not online (ie the initialisation fails ?)
What I don't understand is why it just can't carry on with the initialisation regardless of the interruption by resuming the mirror operation ?