IT WORKS!!!
Special thanks to:
Tom Wike for verifying that BudTool 4.6 actually does work with OnTap 5.1.2R3P1 despite some confusing/conflicting recomendations from both NetApp and Intelliguard.
Bamiyan Gobets & Grant Schuerfeld from NetApp for verifying all our configs.
And finally to Shona McNeill who experienced the same problems and told me to check our jbcap file. After copying the entry for our jukebox from the old jbcap into the new, everything worked great!
If any of you make it to the Philadelphia area, drop me a note so I can buy you a beer.
No thanks to:
Intelliguard support... :(
Unfortunately, we just noticed that the file history database merges are not working properly. Hopefully we can resolve this ourselves, 'cause Intelliguard isn't any help....
Thanks again to everyone who provided help.
jason
In message 199902241959.OAA28405@jagular.dev.susq.com, "Jason D. Kelleher" writes:
On the evening of Saturday February 20 we upgraded from BudTool 4.5 to 4.6. Since then, we've not been able to do local NDMP backups on our filer. Anyone out there experience similar problems? If so, would you please throw some suggestions my way.
We have support calls open with both NetApp (being very helpful considering it's probably not an issue with their system) and Intelliguard (with what we pay them I'd assume they could at least hire warm bodies). Here's what we've come up with so far:
1) Intelliguard told us to add the line "ROBOT_SCSI_LUN 1" to the jbmgr_config for the filer. This allowed the jbmgr daemon to start.
audreyii: /usr/budtool/bud [225] % cat jbmgr_config.audreyii.2 ROBOT_DEV_HOST nfssrv1 ROBOT_DEV_NAME spt0 ROBOT_SCSI_ID 2 ROBOT_SCSI_BUS 2 ROBOT_SCSI_LUN 1 DATA_DEV_NAME0 nrst0a DATA_DEV_HOST0 nfssrv1 PASSWORD xxxxxxxx USEBARCODE N VALIDATECMD "$BTHOME/bin/jbupdate audreyii:2 -a" audreyii: /usr/budtool/bud [226] %
2) Running the BudTool probe-scsi command shows the devices:
audreyii: ~ [230] % probe-scsi -h nfssrv1 Copyright (c) 1990-1999 by Intelliguard Software Inc. probe-scsi BudTool 4. 6 SCSI Bus 0 SCSI Bus 1 SCSI Bus 2 Target 2 Unit 0 Removable Tape EXABYTE EXB-8500 CC37 Unit 1 Removable Jukebox Device EXABYTE TZ Media Changer CC37 SCSI Bus 3 SCSI Bus 4 audreyii: ~ [231] %
3) After placing 2 labled BudToool tapes into the jukebox, running jbupdate returns instantly showing no tapes in the jukebox (doesn't even attempt to read tape labels).
audreyii: ~ [236] % jbupdate audreyii:2 -a
------ W A R N I N G -------
You have selected to force an update on all 5 tapes. This process requires several minutes per tape for unbarcoded tapes. Large jukeboxes with unbarcoded tapes can take several hours. The PID of this process is 8721. Use a HUP, INT, TERM or QUIT to terminate this process if you do not want to continue.
Update complete. audreyii: ~ [237] % jbmgr -H audreyii:2 ls -a 0: 1: 2: 3: 4: audreyii: ~ [238] %
4) We can perform remote NDMP backups to our BudTool master media server, but this isn't practical at ~2.2 GB/hr with 70+ GB on our filer. (It's an old, slow F330 which I'm supposed to be replacing instead of trouble-shooting...) 5) Gave Intelliguard a copy of our config file, the jbmgr diag log, and the probe-scsi output for analysis. The diag file does show some errors: 6) After not receiving much help from Intelliguard, we called NetApp. They sugggested we upgrade from 5.1.2P2 to 5.1.2R3P1 to remove the possibility that this was related to an NDMPD bug (6417). 7) Spoke to Intelliguard again Monday and informed them that we were going to upgrade the filer. We decided to see if the upgrade solved the problem. (They managed to fix a btcp issue after I waited 4 hours for a call back -- I'm still waiting for our sales rep to tell me exactly what kind of turn around to expect when paying extra for 24x7 support.) 8) Nothing changed after the upgrade Monday night. I informed NetApp; they opened another call and had us send them more info. 9) Told Intelliguard, gave them another copy of the diag file.
Configured BudTool to do lev 1 remote backups of the filer, this should hold us until the weekend.
Had to take a personal day Tuesday (for personal reasons). Hoped I get a couple voice/e-mails telling me that someone figured out what was wrong. (Finding out we were stupid and didn't have BudTool configured correctly would have been a relief.)
Wednesday we received some follow-up calls from NetApp, the call was escalated, and we're currently expecting a call.
We verified that tapes can be manually loaded into the jukebox drive and dump run on the console of the filer. Pretty confident that the jukebox didn't coincidentally die during the BudTool upgrade.
Will file missing persons reports for Intelliguard employees in another 24 hours. (Anyone have the number for the police in Dublin, CA?)
So far, I'm confident it's a BudTool issue. (Many thanks/apologies to NetApp if it is.) I'd be very grateful if someone could give me a possible avenue of investigation. I'm at a dead-end and looking at over 30 hours of backups to lev 0 our filer this weekend... :(
thanks,
jason
Jason D. Kelleher kelleher@susq.com Susquehanna Partners, G.P. 610.617.2721 (voice) 401 City Line Ave, Suite 220 610.617.2916 (fax) Bala Cynwyd, PA 19004-1122
"Jason D. Kelleher" wrote:
... Special thanks to:
Tom Wike for verifying that BudTool 4.6 actually does work with OnTap 5.1.2R3P1 despite some confusing/conflicting recomendations from both NetApp and Intelliguard.
... No thanks to:
Intelliguard support... :( Unfortunately, we just noticed that the file history database merges are not working properly. Hopefully we can resolve this ourselves, 'cause Intelliguard isn't any help....
When I upgraded to BudTool 4.6, my file history database merges failed too, except for one system. That one system's name was not fully qualified, so it's merges worked. All the merges failed for systems with fully qualified host names. This is as seen in $BTHOME/hist. First, a patch to BudTool's patch program was required. Then, they gave me a "one off" patch, to make the merges work. It did, except that when my toaster's database was converted to the 4.6 format and the saved up file histories were merged in, the size grew past 500MB and the database for that system split in two. So far so good. Then that system's file history merges failed again, this time because the continuation file had a bad magic number.
It's now been several months, and Intelliguard has no fix for this and hasn't been providing me with a weekly status as they promised they would. In fact, I never hear from them. I temporarily solved the problem myself, about a month ago. I removed the database for that system, restored the pre-4.6 unconverted database and the then-unmerged file histories, raised the split size from 500MB to 1.5MB and then re-merged everything. That got me a single database file that continues to get merged into nightly. However, when I reach the point where the database will split again, my merges will fail again. I have no faith at all that Intelliguard will find a fix before that happens.
-ste
"Shaun T. Erickson" ste@research.bell-labs.com writes:
When I upgraded to BudTool 4.6, my file history database merges failed too, except for one system. That one system's name was not fully qualified, so it's merges worked. All the merges failed for systems with fully qualified host names. This is as seen in $BTHOME/hist. First, a patch to BudTool's patch program was required. Then, they gave me a "one off" patch, to make the merges work. It did, except that when my toaster's database was converted to the 4.6 format and the saved up file histories were merged in, the size grew past 500MB and the database for that system split in two. So far so good. Then that system's file history merges failed again, this time because the continuation file had a bad magic number.
We have just installed BudTool 4.6 from scratch, hosted on a Sun running Solaris 7, backing up one of our newly-installed NetApps, all of which shipped running 5.1.2R3.
Once we'd fixed the jbcap file and could actually talk to the jukebox, we found that history merges fail with msrt coredumping.
We _are_ using FQDNs, but then this one copy of BudTool will be eventually controlling backups for NetApps in different subdomains, so not using FQDNs is not really an option here.
Iguard have mentioned nothing of patches, their answer is for us to upgrade the filer to 5.2.1 as "BudTool is certified against that version". (We refused their first suggestion to downgrade the filer to 5.1.2P2.)
I am loathe to schedule downtime on a busy filer for what may prove an unnecessary upgrade. Is there anyone out there using BudTool 4.6 with file histories _and_ FQDNs _and_ a filer running 5.2.1 who can comment on whether this upgrade will have any effect on the problem whatsoever?
Shona McNeill
Would anyone from Network Appliance care to comment on anything about this thread?
Yours waiting in anticipation
James Lowe Intelliguard Software International
Shona McNeill wrote:
"Shaun T. Erickson" ste@research.bell-labs.com writes:
When I upgraded to BudTool 4.6, my file history database merges failed too, except for one system. That one system's name was not fully qualified, so it's merges worked. All the merges failed for systems with fully qualified host names. This is as seen in $BTHOME/hist. First, a patch to BudTool's patch program was required. Then, they gave me a "one off" patch, to make the merges work. It did, except that when my toaster's database was converted to the 4.6 format and the saved up file histories were merged in, the size grew past 500MB and the database for that system split in two. So far so good. Then that system's file history merges failed again, this time because the continuation file had a bad magic number.
We have just installed BudTool 4.6 from scratch, hosted on a Sun running Solaris 7, backing up one of our newly-installed NetApps, all of which shipped running 5.1.2R3.
Once we'd fixed the jbcap file and could actually talk to the jukebox, we found that history merges fail with msrt coredumping.
We _are_ using FQDNs, but then this one copy of BudTool will be eventually controlling backups for NetApps in different subdomains, so not using FQDNs is not really an option here.
Iguard have mentioned nothing of patches, their answer is for us to upgrade the filer to 5.2.1 as "BudTool is certified against that version". (We refused their first suggestion to downgrade the filer to 5.1.2P2.)
I am loathe to schedule downtime on a busy filer for what may prove an unnecessary upgrade. Is there anyone out there using BudTool 4.6 with file histories _and_ FQDNs _and_ a filer running 5.2.1 who can comment on whether this upgrade will have any effect on the problem whatsoever?
Shona McNeill
shona@netline.net.uk