I thought I'd toss this one up for grabs, whilst I try to find support people to talk to.
I've got a BudTool 4.6, clustered F760, DOT 5.2.1 setup here, and I'm having the dickens of a time getting my filers backed up via NDMP.
Ordinarily I have file history turned on. It would appear that because of this, I'm ramming head on into NetApp BugID 9378. I do in fact see a large pause between pass III and pass IV. A 2 hour pause.
This is a problem, as 2 hours later, the backup dies with the following message in the logs:
Command exceeded maximum allowable run time 14400.
ie, 4 hours.
Questions I have:
* Has anybody found a way to circumvent the multiple-hour pause during a BudTool NDMP backup, _while keeping file history on_ ?
* Anybody seen the `killing job after 4 hours' message? Anybody know a workaround?
While highly sub-optimal, I have a backup running with no file history right now. I'm concerned that that 4 hour thing is gonna bite me in the butt again.
Suggestions? Comments? Shotgun shells to rid myself of this headache?
josh
We had this problem for a while, and we were told to upgrade to DoT 5.3.2D4, and Budtool 4.6.1. This stopped the pauses, but introduced a buffer overflow that makes the netapp dump core whenever it hits a "null DOS condition" while trying to back up. We aren't really sure what a "null DOS condition" is, but it makes us unhappy when we hit it. We were told by NetApp that they are working on a fix, but the bug hasn't even shown up in the NOW database yet... YMMV.
Paul Taylor Sr. Systems Engineer Catalyst Solutions Group 215-841-5540
On Wed, 29 Sep 1999, Josh Tiefenbach wrote:
I thought I'd toss this one up for grabs, whilst I try to find support people to talk to.
I've got a BudTool 4.6, clustered F760, DOT 5.2.1 setup here, and I'm having the dickens of a time getting my filers backed up via NDMP.
Ordinarily I have file history turned on. It would appear that because of this, I'm ramming head on into NetApp BugID 9378. I do in fact see a large pause between pass III and pass IV. A 2 hour pause.
This is a problem, as 2 hours later, the backup dies with the following message in the logs:
Command exceeded maximum allowable run time 14400.
ie, 4 hours.
Questions I have:
- Has anybody found a way to circumvent the multiple-hour pause during a
BudTool NDMP backup, _while keeping file history on_ ?
- Anybody seen the `killing job after 4 hours' message? Anybody know a
workaround?
While highly sub-optimal, I have a backup running with no file history right now. I'm concerned that that 4 hour thing is gonna bite me in the butt again.
Suggestions? Comments? Shotgun shells to rid myself of this headache?
josh
-- This is my .sig! There are many others like it, but this one is mine!
Josh Tiefenbach wrote:
... A 2 hour pause.
This is a problem, as 2 hours later, the backup dies with the following message in the logs:
Command exceeded maximum allowable run time 14400.
ie, 4 hours.
I'm not sure if this is what you're looking for, but we had a filesystem that was taking longer than 21 hours to back up, so budtool killed the dump. Then it would start over again. And again, and again ...
We went into $BTHOME/bud/goserver.end.filter and made a change:
# diff goserver.end.filter.05-24-99 goserver.end.filter 199c199 < param("BT_REQ_CMDTIMEOUT","78800"); ---
param("BT_REQ_CMDTIMEOUT","157600");
#
Then the job lived long enough to finish. Perhaps this will help you, but I'm not sure, since your 4 hour life-to-live is much smaller than the default 21 hours, so maybe it's coming from somewhere else ... YMMV.
-ste
Then the job lived long enough to finish. Perhaps this will help you, but I'm not sure, since your 4 hour life-to-live is much smaller than the default 21 hours, so maybe it's coming from somewhere else ... YMMV.
Yeah. I saw that. I was kinda thrown by the 78800 vs 14400 issue. Tho, upon further reflection, the following may have something to do with it:
In my goserver logs, I have a whole bunch of "errors on log filter program" messages, and something like
unexpected log filter output:bash: filter~header: command not found
showing up all over the place. In looking at ${BTHOME}/bud/goserver.end.filter I see the line
$1 == "filter~header" {source=$2;linecount=$3;next}
about 15 lines after the param(BT_REQ_CMDTIMEOUT,...) line. Not being an awk guru, I have no idea what this is doing, but I suspect that this error is causing the entire goserver.end.filter to fail, and thus the 21 hour timeout is not being applied.
If I may ask, do you have anything like this in your goserver.end.filter file?
josh