Ok, so I've installed the latest and greatest development release that's supposed to fix our clustered_filer_reboots_randomly_during_NDMP_dumps problem, and now instead of randomly rebooting, our filer has gone the random_refusal of_NDMP_connections route. Which leads me to my point, has anyone else who installed this release to solve this problem seen similar results?
Curious,
Mark
Wow, I check my mail and in two consecutive e-mails I lose two clicks of faith in NetApps. Are these critical NDMP problems limited to 5.3.X, or does 5.2.X have issues I should know about? Is thered a 5.3 release that gives me the NDMP functionality I need without giving me the NDMP bugs I don't?
Does NetApp have any kind of document, internal or external, of the form: "How to use NDMP with NetBackup 3.2 without making your filers reboot randomly, or refuse backups, or catch fire"?
On Mon, 2 Aug 1999, Stoltz wrote:
Ok, so I've installed the latest and greatest development release that's supposed to fix our clustered_filer_reboots_randomly_during_NDMP_dumps problem, and now instead of randomly rebooting, our filer has gone the random_refusal of_NDMP_connections route. Which leads me to my point, has anyone else who installed this release to solve this problem seen similar results?
Curious,
Mark
Benjy Feen benjy@feen.com There is no spoon.
On Mon, 2 Aug 1999, Benjy Feen wrote:
Wow, I check my mail and in two consecutive e-mails I lose two clicks of faith in NetApps. Are these critical NDMP problems limited to 5.3.X, or does 5.2.X have issues I should know about? Is thered a 5.3 release that gives me the NDMP functionality I need without giving me the NDMP bugs I don't?
Does NetApp have any kind of document, internal or external, of the form: "How to use NDMP with NetBackup 3.2 without making your filers reboot randomly, or refuse backups, or catch fire"?
This is exactly why I moved away from using NDMP, and why I would like to find a backup solution that doesn't require it. Just a few days ago, when I mentioned this, NetApp went on and on about how they understood they had a problem and worked really hard to fix it and how wonderful the new NDMP code written in JAVA was. And then we see these problems.
I have little faith in NetApp's ability to properly impliment this protocol. Backups are the single most important function I manage as an administrator, and NetApp seems hell bent on making that process a nightmare for me, rather than providing me with a rock solid solution.
-ste
Today, after yet another service interruption from a failed NIC, I realized the obvious, that I should have redundant or at least more reliable network interfaces on all the filers. The easy short-term solution is to have configured but downed second interfaces ready for quick duty, so I'll be setting that up tomorrow. What I'd like to hear from toasters is, first, which interfaces are most stable, whether single port, quad port, or Gbit, and whether anyone has had any experience with the relative reliability of fast etherchannel vs. Gbit.
I realize a single Gbit interface is theoretically less reliable than fast etherchannel because it is a single link, but what I fear about etherchannel is that a tricky etherchannel implementation combined with flaky NICs could be significantly less reliable in practice than a solid Gbit NIC. Certainly the load-balancing algorithms for etherchannel on Cisco switches are not trivial.
Network traffic is not much of an issue. The current single 100tx-fd connections are hardly taxed, with peak loads reaching only 35 Mbit. Either etherchannel or Gbit would provide enough capacity for the next year, possibly longer. The only real issue I have to address is reliability.
ejt
One thing that I looked at recently was using "single-mode trunking". It's like etherchannel, but one one connection is active. I tested it out and worked great for a single filer. (All my filers are clusters, and the config for that is much more complicated.)
Try looking at this: http://now.netapp.com/knowledge/docs/ontap/rel53/html/sag/net14.htm#1186750
Aaron
On Aug 02, Ethan Torretta ethantor@corp.webtv.net wrote:
Today, after yet another service interruption from a failed NIC, I realized the obvious, that I should have redundant or at least more reliable network interfaces on all the filers. The easy short-term solution is to have configured but downed second interfaces ready for quick duty, so I'll be setting that up tomorrow. What I'd like to hear from toasters is, first, which interfaces are most stable, whether single port, quad port, or Gbit, and whether anyone has had any experience with the relative reliability of fast etherchannel vs. Gbit.
I realize a single Gbit interface is theoretically less reliable than fast etherchannel because it is a single link, but what I fear about etherchannel is that a tricky etherchannel implementation combined with flaky NICs could be significantly less reliable in practice than a solid Gbit NIC. Certainly the load-balancing algorithms for etherchannel on Cisco switches are not trivial.
Network traffic is not much of an issue. The current single 100tx-fd connections are hardly taxed, with peak loads reaching only 35 Mbit. Either etherchannel or Gbit would provide enough capacity for the next year, possibly longer. The only real issue I have to address is reliability.
ejt
What I'd like to hear from toasters is, first, which interfaces are most stable, whether single port, quad port, or Gbit, and whether anyone has had any experience with the relative reliability of fast etherchannel vs. Gbit.
I've got the same problems. Seems to me like the on board NIC on the 760s is more flaky that the QFE. But I've seen both sit there in a state of 'flapping' just going up-down, up-down.
Thankfully the filers have nice NIC redundancy with virtual interfaces etc but that functionality is not in the caches.
How many others have seen the same problems?
On Aug 04, Mark Rogers markrogers@int.ozemail.com.au wrote:
What I'd like to hear from toasters is, first, which interfaces are most stable, whether single port, quad port, or Gbit, and whether anyone has had any experience with the relative reliability of fast etherchannel vs. Gbit.
I've got the same problems. Seems to me like the on board NIC on the 760s is more flaky that the QFE. But I've seen both sit there in a state of 'flapping' just going up-down, up-down.
Flapping? I've seen that... then sometimes it quits in the down side. Thankfully, I can do a 'cf takeover' from the partner and clear the problem. I've had most of my faults on 760's with the onboard interface. Only once or twice have I seen it happen on a 630 using a NIC card.
Makes me feel a little better that someone else has seen this.
Aaron
i have to agree here. after upgrading my F520's to 5.3.X and budtool to 4.6.X i have had nothing but nightmares with respect to my backups, leaving me with a large amount of heartburn. my backups are failing due to bug #13252. this is another ndmp/java problem. here is the problem summary:
Problem Summary: The problem is inherent with any process who's stack is VM mapped and touches VM mapped pages. Problem has only been seen on F5xx hardware platform.
In the case of 5.3, NDMP uses Java extensively and the Java Garbage collector has the VM Mapped stack. Thus this problem is most prevalent when: - running 5.3 - during NDMP - on F5xx hardware
netapp does not have a solution (or an idea when there will be one) but there workaround is:
Recommended Solution/Workaround: 13252 is OPEN
The chance of this problem occurring during NDMP can be minimized by:
1. Turning off file history during NDMP backups 2. Reducing the size of the dump (no suggested size except the bigger it is, the more chance you may see the problem).
i'm really wishing i had not decided to go down the ndmp path.
On Mon, Aug 02, Shaun T. Erickson wrote:
I have little faith in NetApp's ability to properly impliment this protocol. Backups are the single most important function I manage as an administrator, and NetApp seems hell bent on making that process a nightmare for me, rather than providing me with a rock solid solution.
On Tue, 3 Aug 1999 mgx@spruce.lsd.ornl.gov wrote:
The chance of this problem occurring during NDMP can be minimized by:
- Turning off file history during NDMP backups
Does anyone know how to do this with Veritas NetBackup? I know how to do it with BudTool, but it does not appear to be possible with NetBackup. I hope I'm just missing something.
ejt