In message 36D6915F.EA2416A6@iguard.com, James Lowe writes:
Would anyone from Network Appliance care to comment on anything about this thread?
Yours waiting in anticipation
James Lowe Intelliguard Software International
I'm not from NetApp, but I'd love to comment...
We recently (Saturday) upgraded from BudTool 4.5 to 4.6. After Shona told us to check the jbcap file (care to comment why AFTER 6 DAYS NO ONE AT INTELLIGUARD HAD HELPED US!!!), so BudTool 4.6 could control the jukebox on our filer we have the following systems running:
BudTool 4.6
SunOS 4.1.3_U1, Solaris 2.5.1, 2.6, & 2.7
OnTap 5.1.2R3P1 (recommended by NetApp support)
The NetApp is working great.
The only problems we're currently having regard errors in the btmerge reports:
============================================= ============================================= Running btmerge for host hercules.dev.susq.com
Command: (btmerge -s / -o /audreyii/budtool/hist/hercules.dev.susq.com/db.hercules.dev.susq.com.n.of.m.tm p -m /audreyii/budtool/hist/hercules.dev.susq.com/db.hercules.dev.susq.com.n.of.m -H hercules.dev.susq.com -c /usr/budtool/bud/.budfclass -i /usr/budtool/hist/db.hercules.dev.susq.com -i /usr/budtool/build/db.her cules.dev.susq.com.int.0)
****************************************************************** ------ E R R O R ------- Cannot get maximum file history database continuation number: (Error 0) ****************************************************************** ****************************************************************** ------ E R R O R ------- A btmerge input file was busy. Rebuilding list of input files. ****************************************************************** ============================================= =============================================
These are refering to a Solaris 2.6 system with a FQDN in the .buddb. After removing all FQDNs from our .buddb, everything worked fine. So, I think you should quite pointing fingers at NetApp and FIX THE PROBLEM!
jason
--- Jason D. Kelleher kelleher@susq.com Susquehanna Partners, G.P. 610.617.2721 (voice) 401 City Line Ave, Suite 220 610.617.2916 (fax) Bala Cynwyd, PA 19004-1122
Shona McNeill wrote:
"Shaun T. Erickson" ste@research.bell-labs.com writes:
When I upgraded to BudTool 4.6, my file history database merges failed too, except for one system. That one system's name was not fully qualified, so it's merges worked. All the merges failed for systems wit
h
fully qualified host names. This is as seen in $BTHOME/hist. First, a patch to BudTool's patch program was required. Then, they gave me a "on
e
off" patch, to make the merges work. It did, except that when my toaster's database was converted to the 4.6 format and the saved up fil
e
histories were merged in, the size grew past 500MB and the database for that system split in two. So far so good. Then that system's file history merges failed again, this time because the continuation file ha
d
a bad magic number.
We have just installed BudTool 4.6 from scratch, hosted on a Sun running Solaris 7, backing up one of our newly-installed NetApps, all of which shipped running 5.1.2R3.
Once we'd fixed the jbcap file and could actually talk to the jukebox, we found that history merges fail with msrt coredumping.
We _are_ using FQDNs, but then this one copy of BudTool will be eventuall
y
controlling backups for NetApps in different subdomains, so not using FQD
Ns
is not really an option here.
Iguard have mentioned nothing of patches, their answer is for us to upgra
de
the filer to 5.2.1 as "BudTool is certified against that version". (We refused their first suggestion to downgrade the filer to 5.1.2P2.)
I am loathe to schedule downtime on a busy filer for what may prove an unnecessary upgrade. Is there anyone out there using BudTool 4.6 with fi
le
histories _and_ FQDNs _and_ a filer running 5.2.1 who can comment on whet
her
this upgrade will have any effect on the problem whatsoever?
Shona McNeill
shona@netline.net.uk
errr..sorry Mr Kellerher, I thought the problem that we were discussing was core dumps from BudTool due to 'bad' Filehistory streams from the filers found by the NetApp tech team and resolved in the P2 patch of 5.1.2P2.
First, a patch to BudTool's patch program was required.
Oh dear me..software that needs patches...now that is unheard of isn't it?
Then, they gave me a "one
off" patch, to make the merges work. It did,
Patches that work?..my oh my..whatever next
except that when my
toaster's database was converted to the 4.6 format and the saved up fil
e
histories were merged in, the size grew past 500MB and the database for that system split in two. So far so good. Then that system's file history merges failed again, this time because the continuation file ha
d
a bad magic number.
..and guess what...this is a KNOWN PROBLEM WITH THE ONTAP OS PRIOR TO 5.1.2P2
And if you are really really lucky you might see all the CPU sucked out of your BudTool machine by BudTool's merge binaries as it tries to negotiate 'bad' filehistory datastreams from the filer. You can watch your network grind to a halt as your NFS falls over and see your precious disk space fill up with a lovely juicy core file fresh from the MSRT oven.
Now that was us producing the core file we have never denied that, but instead of..as you so elegantly put it...let me see..
ah yes..
So, I think you should quite pointing fingers at NetApp and FIX THE PROBLEM!
We didn't point fingers oh no...through what is known as 'cooperation' and being 'proactive' Intelliguard AND Network Appliance worked on it... and you know what? 5.1.2'P2' was produced but nothing was needed for BudTool note. You see NDMP is a 'joint' effort (not just with Intelliguard and NetApp I might add) And you know what else? We did the same with NetApp for 5.1.2..we worked together and now we are working towards OnTap 5.3...now I don't know exact figures, but i reckon that between 5.0.1 and 5.1.2 there have been a few dozen P and R and D (and whatever else letters you care to use) patches for OnTap, some of them making quite significant changes to the filer I believe and some of them out less than 24 hours after the previous ones. If ANY software company could test against all these different versions in such a short space of time I would be very impressed. Indeed some of these extra patches from NetApp were to fix bugs in previous patches a days earlier.
As far as I am aware (and you will need to verify with NetApp about this I suppose as you obviously have no time for Intelliguard). This release of OnTap that you have been recommended by NetApp has not been certified (as far as I, or any of the team in Europe have been informed anyway) against BudTool 4.6. So far be it for me to start advising people to upgrade to *Operating systems* that may or may not work with our product.
You may not care about any of this, and just want a fix, fair enough. However, before you start 'spouting off' at people like myself who do try and genuinely help as much as we possibly can can I point out one last thing please...
(care to comment why AFTER 6
DAYS NO ONE AT INTELLIGUARD HAD HELPED US!!!)
I won't defend genuinely poor 'support' from Intelliguard ever, especially if you have paid for the maintenance, however why did you have to wait 6 days anyway? If the problem was so urgent as you have blatantly made obvious it was, why were you not on the phone, email bombarding the support team until you had your answer?, getting cynical, rude and arrogant as you do so well on this mailing list and even ask to be put on hold until a 'manager' was made available to you? You may say you shouldn;t have to, and I would agree whole heartedly...but life's funny like that jason...sometimes it lets you down..
Have a good weekend
James Lowe Intelliguard Software International
Jason D. Kelleher wrote:
In message 36D6915F.EA2416A6@iguard.com, James Lowe writes:
Would anyone from Network Appliance care to comment on anything about this thread?
Yours waiting in anticipation
James Lowe Intelliguard Software International
I'm not from NetApp, but I'd love to comment... We recently (Saturday) upgraded from BudTool 4.5 to 4.6. After Shona told us to check the jbcap file (care to comment why AFTER 6 DAYS NO ONE AT INTELLIGUARD HAD HELPED US!!!), so BudTool 4.6 could control the jukebox on our filer we have the following systems running: BudTool 4.6 SunOS 4.1.3_U1, Solaris 2.5.1, 2.6, & 2.7 OnTap 5.1.2R3P1 (recommended by NetApp support) The NetApp is working great. The only problems we're currently having regard errors in the btmerge reports:
=============================================
Running btmerge for host hercules.dev.susq.com
Command: (btmerge -s / -o /audreyii/budtool/hist/hercules.dev.susq.com/db.hercules.dev.susq.com.n.of.m.tm p -m /audreyii/budtool/hist/hercules.dev.susq.com/db.hercules.dev.susq.com.n.of.m -H hercules.dev.susq.com -c /usr/budtool/bud/.budfclass -i /usr/budtool/hist/db.hercules.dev.susq.com -i /usr/budtool/build/db.her cules.dev.susq.com.int.0)
------ E R R O R -------
Cannot get maximum file history database continuation number: (Error 0)
------ E R R O R -------
A btmerge input file was busy. Rebuilding list of input files.
=============================================
These are refering to a Solaris 2.6 system with a FQDN in the .buddb. After removing all FQDNs from our .buddb, everything worked fine. So, I think you should quite pointing fingers at NetApp and FIX THE PROBLEM! jason
Jason D. Kelleher kelleher@susq.com Susquehanna Partners, G.P. 610.617.2721 (voice) 401 City Line Ave, Suite 220 610.617.2916 (fax) Bala Cynwyd, PA 19004-1122
Shona McNeill wrote:
"Shaun T. Erickson" ste@research.bell-labs.com writes:
When I upgraded to BudTool 4.6, my file history database merges failed too, except for one system. That one system's name was not fully qualified, so it's merges worked. All the merges failed for systems wit
h
fully qualified host names. This is as seen in $BTHOME/hist. First, a patch to BudTool's patch program was required. Then, they gave me a "on
e
off" patch, to make the merges work. It did, except that when my toaster's database was converted to the 4.6 format and the saved up fil
e
histories were merged in, the size grew past 500MB and the database for that system split in two. So far so good. Then that system's file history merges failed again, this time because the continuation file ha
d
a bad magic number.
We have just installed BudTool 4.6 from scratch, hosted on a Sun running Solaris 7, backing up one of our newly-installed NetApps, all of which shipped running 5.1.2R3.
Once we'd fixed the jbcap file and could actually talk to the jukebox, we found that history merges fail with msrt coredumping.
We _are_ using FQDNs, but then this one copy of BudTool will be eventuall
y
controlling backups for NetApps in different subdomains, so not using FQD
Ns
is not really an option here.
Iguard have mentioned nothing of patches, their answer is for us to upgra
de
the filer to 5.2.1 as "BudTool is certified against that version". (We refused their first suggestion to downgrade the filer to 5.1.2P2.)
I am loathe to schedule downtime on a busy filer for what may prove an unnecessary upgrade. Is there anyone out there using BudTool 4.6 with fi
le
histories _and_ FQDNs _and_ a filer running 5.2.1 who can comment on whet
her
this upgrade will have any effect on the problem whatsoever?
Shona McNeill
shona@netline.net.uk
Mr. Lowe,
Perhaps you should pay attention to whom you attribute statements before you get upset at the wrong person. Many, of the quotes you attributed to Mr. Kellerher, were made not by him, but by myself, Shaun Erickson. Your sarcasm was unwarranted, and not all of your statements are correct.
James Lowe wrote:
errr..sorry Mr Kellerher, I thought the problem that we were discussing was core dumps from BudTool due to 'bad' Filehistory streams from the filers found by the NetApp tech team and resolved in the P2 patch of 5.1.2P2.
No. The problem I was discussing was the bad magic number that gets placed into the file history database continuation file at the time that it is spilt into more than one file. The problem cannot be from the data coming from OnTap, as the nightly file histories get flawlessly merged into a single database file, but only fail AFTER your software splits the file and places that bad magic number in it. Jason had the FQDN problem apparently, as I did, but my problems went further.
First, a patch to BudTool's patch program was required.
Oh dear me..software that needs patches...now that is unheard of isn't it?
I wasn't complaining that patch required patching. I simply was describing what was required to get the system working properly after my upgrade to 4.6. The first thing that needed to be done was to patch patch. So far, no problem.
Then, they gave me a "one
off" patch, to make the merges work. It did,
Patches that work?..my oh my..whatever next
Again, I wasn't complaining. In fact, I stated that it did, for the most part, fix my problem. So, far, still no problem.
except that when my
toaster's database was converted to the 4.6 format and the saved up fil
e
histories were merged in, the size grew past 500MB and the database for that system split in two. So far so good. Then that system's file history merges failed again, this time because the continuation file ha
d
a bad magic number.
..and guess what...this is a KNOWN PROBLEM WITH THE ONTAP OS PRIOR TO 5.1.2P2
No. This is not how the problem has been represented to me. In all of my discussions with Intelliguard personnel, it has only been suggested that this is caused probably by the code that splits the database when it reaches the specified size limit. Additionally, I *AM*, and was, running OnTap 5.1.2P2, at Intelliguard's request, and running it does *NOT* solve this problem.
And if you are really really lucky you might see all the CPU sucked out of your BudTool machine by BudTool's merge binaries as it tries to negotiate 'bad' filehistory datastreams from the filer. You can watch your network grind to a halt as your NFS falls over and see your precious disk space fill up with a lovely juicy core file fresh from the MSRT oven.
Yes, that problem exists, and 5.1.2P2 supposedly fixes it (I don't know from personal experience, as I didn't experience that particular problem), but that's not the problem I was discussing, or am having.
Now that was us producing the core file we have never denied that, but instead of..as you so elegantly put it...let me see..
ah yes..
So, I think you should quite pointing fingers at NetApp and FIX THE PROBLEM!
We didn't point fingers oh no...through what is known as 'cooperation' and being 'proactive' Intelliguard AND Network Appliance worked on it... and you know what? 5.1.2'P2' was produced but nothing was needed for BudTool note. You see NDMP is a 'joint' effort (not just with Intelliguard and NetApp I might add) And you know what else? We did the same with NetApp for 5.1.2..we worked together and now we are working towards OnTap 5.3...now I don't know exact figures, but i reckon that between 5.0.1 and 5.1.2 there have been a few dozen P and R and D (and whatever else letters you care to use) patches for OnTap, some of them making quite significant changes to the filer I believe and some of them out less than 24 hours after the previous ones. If ANY software company could test against all these different versions in such a short space of time I would be very impressed. Indeed some of these extra patches from NetApp were to fix bugs in previous patches a days earlier.
You may claim you work closely with NetApp, but I've yet to be convinced of that. When I have a problem with BudTool, that involves my NetApp, I've almost always been told how horrible NetApp is for releasing so many versions of their OS and how hard it is to keep recertifying your stuff with theirs. As a former software tester, I agree, it probably is quite difficult. But I'd rather you folks worked with me, to solve my problems, rather than waste my time whining about NetApp. If you have a problem with NetApp's release schedule, work with them, don't whine to me.
As far as I am aware (and you will need to verify with NetApp about this I suppose as you obviously have no time for Intelliguard). This release of OnTap that you have been recommended by NetApp has not been certified (as far as I, or any of the team in Europe have been informed anyway) against BudTool 4.6. So far be it for me to start advising people to upgrade to *Operating systems* that may or may not work with our product.
You might not do that advising, but I got really burned once by Intelliguard. I had a need to upgrade my filer to 5.1.x, to fix a problem, but was told by your company that BudTool 4.5 didn't support it. I really needed to do that upgrade, yet 4.6 wasn't out yet. So, I got both companies on the phone, and was told that my 4.5 would be supported if I upgraded my filer. So I did, ran into problems with my backups, and was promptly told by Intelliguard that I wouldn't be supported in that configuration. As I said, I got burned.
You may not care about any of this, and just want a fix, fair enough. However, before you start 'spouting off' at people like myself who do try and genuinely help as much as we possibly can can I point out one last thing please...
(care to comment why AFTER 6
DAYS NO ONE AT INTELLIGUARD HAD HELPED US!!!)
I won't defend genuinely poor 'support' from Intelliguard ever, especially if you have paid for the maintenance, however why did you have to wait 6 days anyway? If the problem was so urgent as you have blatantly made obvious it was, why were you not on the phone, email bombarding the support team until you had your answer?, getting cynical, rude and arrogant as you do so well on this mailing list and even ask to be put on hold until a 'manager' was made available to you? You may say you shouldn;t have to, and I would agree whole heartedly...but life's funny like that jason...sometimes it lets you down..
Have a good weekend
James Lowe Intelliguard Software International
I can't speak about Jason, but I can tell you that as weeks and weeks and weeks went by, with no fix for my magic number problem forthcoming, and people at Intelliguard repeatedly not returning my calls or calling me regularly, with updates, as they promised, I tend to get angry and rude and begin to wonder if my money wouldn't be better spent elsewhere. Perhaps that's less than professional of me. If so, I apologize, but I still don't have a fix for that problem and continue not to hear from your company regarding a resolution to it. Additionally, I find Intelliguard's support people (at least the one's assigned to me) to be rude, inattentive, inexperienced and, at times, seemingly incompetent, as I've stated elsewhere. This makes it most difficult when I have to call in with a problem and I want to deal with *it*, not with *people* issues.
I don't think it's fair to the toasters list to keep hashing this out here. If you reply to this, please reply to me only, so as to not annoy the rest of the group unnecessarily.
-ste