Hi all

I don't see a bug which is a precise match to this, but I do see that both scenarios were using 7.0.x releases, and I see a fair few SnapMirror bugs have been fixed in 7.2.4; so I am wondering if in either of the scenarios it is possible to move both filers to 7.2.4 (I semi-fear it isn't especially for the source filers concerned) and/or if anyone has seen this on a 7.2.x release.

cheers
Kenneth

http://now.netapp.com/NOW/cgi-bin/relcmp.on?&rrel=7.0.6&rrel=7.2.4&what=fix

> Subject: RE: Oddball SnapMirror issue
> Date: Sun, 4 May 2008 13:24:05 -0500
> From: mpartyka@acmn.com
> To: tmacmd@gmail.com; owner-toasters@mathworks.com; phigmov@gmail.com; toasters@mathworks.com
>
> Is there any reason to prefer wafliron over WAFL_check? Sounds like they
> do the same thing but you have the option to only check not
> automatically fix with WAFL_check.
>
> -Mike
>
> -----Original Message-----
> From: tmacmd@gmail.com [mailto:tmacmd@gmail.com]
> Sent: Sunday, May 04, 2008 12:59 PM
> To: Mike Partyka; owner-toasters@mathworks.com; Raj Patel; NetApp
> Toasters List
> Subject: Re: Oddball SnapMirror issue
>
> I would try a wafl iron on the source volume/aggr
>
> Just because you do not see any filesystem problems, does not mean ther
> are not any.
>
> --tmac
>
> Sent from my Verizon Wireless BlackBerry
>
> -----Original Message-----
> From: "Mike Partyka" <mpartyka@acmn.com>
>
> Date: Sun, 4 May 2008 09:28:18
> To:"Raj Patel" <phigmov@gmail.com>, <toasters@mathworks.com>
> Subject: RE: Oddball SnapMirror issue
>
>
> I'm having a similar experience trying to setup a Snapmirror between a
> pair of filers in the same datacenter (Not separated by a firewall). The
> source is a 3050 running DOT 7.0.5 and the destination is a 270 running
> 7.0.6. The volume is a 420G volume serving unstructured CIFS data. When
> I start the initialize everything works fine until it gets to about 82
> or 83G, then the initialize aborts. The log contains some very
> non-specific messages, here is the current snapmirror log:
>
> sys Sat May 3 09:12:55 CDT SnapMirror_off (shutdown)
> log Sat May 3 09:15:31 CDT FILER_REBOOTED
> sys Sat May 3 09:15:34 CDT SnapMirror_on (registry)
> dst Sat May 3 10:09:36 CDT 10.0.10.238:data hci2:rcv_data Request
> (Initialize)
> dst Sat May 3 10:09:42 CDT 10.0.10.238:data hci2:rcv_data Start
> dst Sat May 3 11:51:24 CDT 10.0.10.238:data hci2:rcv_data Abort
> (snapmirror transfer failed to complete)
>
> Just as the Raj says when it fails to initialize the destination volume
> is in limbo, you can't online it due to the failed initialize. Here is
> the error:
>
> vol online: Volume 'rcv_data' was left in an inconsistent state by an
> aborted vol copy or an aborted snapmirror initial (level 0) transfer.
> In order to bring it online, you must either destroy and re-create
> the volume, or complete an initial snapmirror transfer or vol copy.
>
> I have considered running WAFL_check but WAFL isn't reporting an
> inconsistent state so i'm not sure that would be very effective.
> Yesterday I upgraded both filers to DOT 7.2.4 and updated all firmware
> then retried with the exact same results.
>
> The only thing I can think of doing now is running a packet capture on
> the filer while it runs and see what that tells me.
>
> -Mike
>
> -----Original Message-----
> From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com]
> On Behalf Of Raj Patel
> Sent: Sunday, May 04, 2008 1:29 AM
> To: George T Chen
> Cc: toasters@mathworks.com
> Subject: Re: Oddball SnapMirror issue
>
> Hi George,
>
> The working transfers do just update 10 to 20Mb - very small turnover.
>
> Unfortunately the two I need to mirror are from scratch - no baseline
> snapshot. The checkpoint restart occurring during the initialisation
> phase. Once the initialisation phase stalls further updates fail as
> the volume is not online (obviusly because the init failed).
>
> I tried setting a once-a-day schedule at a particular time so it
> wouldn't trip over itself or other snapmirror operations to no avail.
>
> As other volumes are updating with small update it made me wonder if
> it wasn't the router ipsec tunnel or firewall prematurely closing a
> connection for a large baseline transfer.
>
> I'll attach the log & config when I get back into work.
>
> Cheers,
> Raj.
>
> On Sun, May 4, 2008 at 4:36 PM, George T Chen <gtchen@yahoo-inc.com>
> wrote:
> > Since you have one volume already transferring, then there's no
> network
> > or firewall issue--any problem at that level would affect all
> volumes,
> > not just a few.
> >
> > A "Pending with restart checkpoint" appears you abort an ongoing
> > transfer. Checkpoint occur every ?? megabytes and gives Ontap a
> place
> > to restart instead of from scratch. It's hard to debug without more
> > info, but I would start by:
> >
> > 1) doing a snapmirror break on the volume (not just an abort)
> > 2) verify that there is a common baseline snapshot on both source and
> > destination
> > 3) restart with a snapmirror resync command
> >
> > Depending on step 2, you may be required to go to a snapmirror
> > initialize.
> >
> > What do the /etc/log/snapmirror and /etc/messages file say?
> >
> > -gtchen
> >
> >
> >
> > > -----Original Message-----
> > > From: owner-toasters@mathworks.com
> > [mailto:owner-toasters@mathworks.com]
> > > On Behalf Of Raj Patel
> > > Sent: Saturday, May 03, 2008 2:00 AM
> > > To: toasters@mathworks.com
> > > Subject: Oddball SnapMirror issue
> > >
> > > We've got two FAS 270's in different cities. They're connected by a
> > > 10mb pipe with routers (running ipsec) & firewalls (checkpoint
> splat)
> > > seperating each datacenter.
> > >
> > > The primary san is fine and runs all our prod volumes (7.0.5) which
> > > are mirrored to our secondary san (7.0.6).
> > >
> > > Recently I had to recreate the mirror relationship for some volumes
> as
> > > they'd fallen far out of sync due to some firewall work.
> > >
> > > What I am seeing is one volume is syncing fine, one has a small lag
> > > and two are stuck with a status of 'Pending with restart
> checkpoint'
> > > after I re-initialised the transfer.
> > >
> > > snapmirror status -l shows this for one of the two that just don't
> get
> > > properly initialised
> > >
> > > Source: 10.1.45.7:sqlprod01
> > > Destination: adcsan1:sqlprod01_mirror
> > > Status: Pending with restart checkpoint
> > > Progress: 38376 KB
> > > State: Unknown
> > > Lag: -
> > > Mirror Timestamp: -
> > > Base Snapshot: -
> > > Current Transfer Type: Retry
> > > Current Transfer Error: volume is not online; cannot execute
> operation
> > > Contents: -
> > > Last Transfer Type: -
> > > Last Transfer Size: -
> > > Last Transfer Duration: -
> > > Last Transfer From: -
> > >
> > > Our firewalls rules have been relaxed to allow free-flow between
> these
> > > devices (instead of just the SnapMirror ports) and the routers and
> > > circuit haven't changed at all between it working fine and not
> working
> > > now. The volume that is mirroring OK seems fine and still syncs
> fine -
> > > granted the updates are small whereas the three non-working volumes
> > > have to sync quite a lot of data.
> > >
> > > I've tried deleting the mirrored volumes, recreating them, setting
> up
> > > the mirror relationship again (with a variety of scheduling and
> > > bandwidth throttling options) and doing a destination SAN reboot.
> > >
> > > What are the best options to troubleshoot this or insuring a
> > > successful mirror ? Has anyone had issues with dropped or stalled
> > > SnapMirror baseline transfers via an IPSec tunnel or Firewall ?
> > >
> > > Thanks in advance,
> > > Raj.
> > >
> > > PS As an addendum it looks like it starts a transfer, stalls and
> from
> > > then on subsequent mirrors fail because its not online (ie the
> > > initialisation fails ?)
> > >
> > > What I don't understand is why it just can't carry on with the
> > > initialisation regardless of the interruption by resuming the
> mirror
> > > operation ?
> >
>
>

Express yourself instantly with MSN Messenger! MSN Messenger