Because I'm an idiot, I didn't even bother to provide useful details on my Filers. Basically, we've got an R200 on one coast with OnTap 7.0.3P3 installed. I've got an FAS960 also running 7.0.3P3 on the other coast. The 960 is snapvaulting a bunch of qtrees to the R200.
So after I sent out my plea for help (and opened a bug with NetApp) I tried doing something silly. I noticed that the snapvault retries were set to 2, the default. This is with:
r200> snapvault modify <dest>
So I decided, since I had nothing to lose, that maybe I should raise the retry limit up, since we have been doing various WAN network tests and that might have broken things. So I did:
r200> snapvault modifty -t 10 <dest>
It hung for a while, say 30-60 seconds. Made me wonder if I had hosed the R200 and made it reboot or something. Came back and my status had changed from "Quiescing" to "Idle". Excellent!
So I was then able to do:
r200> snapvault update <dest>
And hey, it started transfering data again. All the other snapvaults seemed to be stuck still, but this one was going again. Gave it about 5-10 or 15 minutes and then I noticed that <dest> was stuck, but this time with a Status of "Pending" and an Error message of "too many active transfers at once on the source". Looking at the FAS960, all the other stuck snapvaults were now transfering data to the R200.
It's still too early to know if this will actually help me, since none of them have dropped down their lag time, though some are back into Quiescing state. We'll see what happens.
Hmm... some are Idle on the Source, but not on the Destination. So it looks like we've managed to kick them up again. Very good.