On the evening of Saturday February 20 we upgraded from BudTool 4.5
to 4.6. Since then, we've not been able to do local NDMP backups
on our filer. Anyone out there experience similar problems? If
so, would you please throw some suggestions my way.
We have support calls open with both NetApp (being very helpful
considering it's probably not an issue with their system) and
Intelliguard (with what we pay them I'd assume they could at least
hire warm bodies). Here's what we've come up with so far:
1) Intelliguard told us to add the line "ROBOT_SCSI_LUN 1" to
the jbmgr_config for the filer. This allowed the jbmgr
daemon to start.
audreyii: /usr/budtool/bud [225] % cat jbmgr_config.audreyii.2
ROBOT_DEV_HOST nfssrv1
ROBOT_DEV_NAME spt0
ROBOT_SCSI_ID 2
ROBOT_SCSI_BUS 2
ROBOT_SCSI_LUN 1
DATA_DEV_NAME0 nrst0a
DATA_DEV_HOST0 nfssrv1
PASSWORD xxxxxxxx
USEBARCODE N
VALIDATECMD "$BTHOME/bin/jbupdate audreyii:2 -a"
audreyii: /usr/budtool/bud [226] %
2) Running the BudTool probe-scsi command shows the devices:
audreyii: ~ [230] % probe-scsi -h nfssrv1
Copyright (c) 1990-1999 by Intelliguard Software Inc. probe-scsi BudTool 4.6
SCSI Bus 0
SCSI Bus 1
SCSI Bus 2
Target 2
Unit 0 Removable Tape EXABYTE EXB-8500 CC37
Unit 1 Removable Jukebox Device EXABYTE TZ Media Changer CC37
SCSI Bus 3
SCSI Bus 4
audreyii: ~ [231] %
3) After placing 2 labled BudToool tapes into the jukebox,
running jbupdate returns instantly showing no tapes in the
jukebox (doesn't even attempt to read tape labels).
audreyii: ~ [236] % jbupdate audreyii:2 -a
------------------------------------------------------------------
------ W A R N I N G -------
You have selected to force an update on all 5 tapes. This process
requires several minutes per tape for unbarcoded tapes. Large
jukeboxes with unbarcoded tapes can take several hours. The PID of
this process is 8721. Use a HUP, INT, TERM or QUIT to terminate
this process if you do not want to continue.
------------------------------------------------------------------
Update complete.
audreyii: ~ [237] % jbmgr -H audreyii:2 ls -a
0:
1:
2:
3:
4:
audreyii: ~ [238] %
4) We can perform remote NDMP backups to our BudTool master
media server, but this isn't practical at ~2.2 GB/hr with
70+ GB on our filer. (It's an old, slow F330 which I'm
supposed to be replacing instead of trouble-shooting...)
5) Gave Intelliguard a copy of our config file, the jbmgr diag
log, and the probe-scsi output for analysis. The diag file
does show some errors:
6) After not receiving much help from Intelliguard, we called
NetApp. They sugggested we upgrade from 5.1.2P2 to
5.1.2R3P1 to remove the possibility that this was related to
an NDMPD bug (6417).
7) Spoke to Intelliguard again Monday and informed them that we
were going to upgrade the filer. We decided to see if the
upgrade solved the problem. (They managed to fix a btcp
issue after I waited 4 hours for a call back -- I'm still
waiting for our sales rep to tell me exactly what kind of
turn around to expect when paying extra for 24x7 support.)
8) Nothing changed after the upgrade Monday night. I informed
NetApp; they opened another call and had us send them more
info.
9) Told Intelliguard, gave them another copy of the diag file.
10) Configured BudTool to do lev 1 remote backups of the filer,
this should hold us until the weekend.
11) Had to take a personal day Tuesday (for personal reasons).
Hoped I get a couple voice/e-mails telling me that someone
figured out what was wrong. (Finding out we were stupid
and didn't have BudTool configured correctly would have
been a relief.)
12) Wednesday we received some follow-up calls from NetApp, the
call was escalated, and we're currently expecting a call.
13) We verified that tapes can be manually loaded into the
jukebox drive and dump run on the console of the filer.
Pretty confident that the jukebox didn't coincidentally die
during the BudTool upgrade.
14) Will file missing persons reports for Intelliguard
employees in another 24 hours. (Anyone have the number for
the police in Dublin, CA?)
So far, I'm confident it's a BudTool issue. (Many thanks/apologies
to NetApp if it is.) I'd be very grateful if someone could give me
a possible avenue of investigation. I'm at a dead-end and looking
at over 30 hours of backups to lev 0 our filer this weekend... :(
thanks,
jason
---
Jason D. Kelleher kelleher(a)susq.com
Susquehanna Partners, G.P. 610.617.2721 (voice)
401 City Line Ave, Suite 220 610.617.2916 (fax)
Bala Cynwyd, PA 19004-1122