On 10/27/98 13:48:40 you wrote:
Hmm... That's interesting. Is there a CPU usage threshold before the filer becomes "Busy"? Is it 60%, 70% or what?
It's probably "busy" in the sense of heavy disk activity. Which might be associated with high CPU usage but not necessarily. As someone else noted, frequent rdates can do it too. Perhaps running a backup?
I know certain models and certain OSes are less prone to this, but I don't know the details of how Netapp currently views the situation.
Bruce
In message 19981027212824341@ix.netcom.com, sirbruce@ix.netcom.com writes:
On 10/27/98 13:48:40 you wrote:
Hmm... That's interesting. Is there a CPU usage threshold before the filer becomes "Busy"? Is it 60%, 70% or what?
It's probably "busy" in the sense of heavy disk activity. Which might be associated with high CPU usage but not necessarily. As someone else noted, frequent rdates can do it too. Perhaps running a backup?
I know certain models and certain OSes are less prone to this, but I don't know the details of how Netapp currently views the situation.
Bruce
If I recall correctly, NetApp suggests you don't cron rdate's to run on the hour because it's possible to step the clock past the hour and miss a scheduled event such as raid scrubbing or snapshots. I beleive I read this on NOW, not sure.
jason
--- Jason D. Kelleher kelleher@susq.com Susquehanna Partners, G.P. 610.617.2721 (voice) 401 City Line Ave, Suite 220 610.617.2916 (fax) Bala Cynwyd, PA 19004-1122
If I recall correctly, NetApp suggests you don't cron rdate's to run on the hour because it's possible to step the clock past the hour and miss a scheduled event such as raid scrubbing or snapshots. I beleive I read this on NOW, not sure.
Interesting! Wish they would provide an alternate solution. The don't provide NTP and then they discourage us from cron'ing rdate. What do they think? I have nothing else meaningful to do, but sit here and periodically do remoteshell filer1 "hello brain dead, here is your date/time" remoteshell filre2 "hello brain dead, here is your date/time" .... remoteshell filerN "hello brain dead, here is your date/time"
-- Begin original message --
From: "Jason D. Kelleher" kelleher@susq.com Date: Wed, 28 Oct 1998 04:30:27 -0500 Subject: Re: raid scrubbing and hot spares To: sirbruce@ix.netcom.com Cc: toasters@mathworks.com
In message 19981027212824341@ix.netcom.com, sirbruce@ix.netcom.com writes:
On 10/27/98 13:48:40 you wrote:
Hmm... That's interesting. Is there a CPU usage threshold before the filer becomes "Busy"? Is it 60%, 70% or what?
It's probably "busy" in the sense of heavy disk activity. Which might be associated with high CPU usage but not necessarily. As someone else noted, frequent rdates can do it too. Perhaps running a backup?
I know certain models and certain OSes are less prone to this, but I don't know the details of how Netapp currently views the situation.
Bruce
If I recall correctly, NetApp suggests you don't cron rdate's to run on the hour because it's possible to step the clock past the hour and miss a scheduled event such as raid scrubbing or snapshots. I beleive I read this on NOW, not sure. jason
Jason D. Kelleher kelleher@susq.com Susquehanna Partners, G.P. 610.617.2721 (voice) 401 City Line Ave, Suite 220 610.617.2916 (fax) Bala Cynwyd, PA 19004-1122
-- End original message --
Philip Thomas Motorola - PEL, M/S M350 2200 W. Broadway M350 Mesa, AZ 85202 rxjs80@email.sps.mot.com (602) 655-3678 (602) 655-2285 (fax)
If I recall correctly, NetApp suggests you don't cron rdate's to run on the hour because it's possible to step the clock past the hour and miss a scheduled event such as raid scrubbing or snapshots.
The don't provide NTP and then they discourage us from cron'ing rdate. What do they think?
I think the issue is when you run rdate. Just schedule rdate to run 37 minutes after the hour, or 12 minutes after the hour. If those involve a crossing of the hour boundary then you have other problems.
-Michael Cerda
The don't provide NTP and then they discourage us from cron'ing rdate. What do they think? I have nothing else meaningful to do, but sit here and periodically do
Hey - chill out. NTP is on the way. We all know that rsh/cron/rdate is brain-damaged.
Besides, they don't discourage us from cron'ing rdate. It's just pretty dumb to run any time-independent process at the top of the hour, let alone rdate.
Nick Hilliard Ireland On-Line System Operations
The don't provide NTP and then they discourage us from cron'ing rdate. What do they think? I have nothing else meaningful to do, but sit here and periodically do
Hey - chill out. NTP is on the way. We all know that rsh/cron/rdate is brain-damaged.
"chill-out"?? I am 'frozen', waiting (since Feb 1994) for this elusive NTP from NetApp. :-)
-- Begin original message --
From: Nick Hilliard nick@iol.ie Date: Wed, 28 Oct 1998 19:17:00 +0000 (GMT) Subject: Re: raid scrubbing and hot spares To: thomas@act.sps.mot.com (Philip Thomas) Cc: toasters@mathworks.com
The don't provide NTP and then they discourage us from cron'ing rdate. What do they think? I have nothing else meaningful to do, but sit here and periodically do
Hey - chill out. NTP is on the way. We all know that rsh/cron/rdate is brain-damaged.
Besides, they don't discourage us from cron'ing rdate. It's just pretty dumb to run any time-independent process at the top of the hour, let alone rdate.
Nick Hilliard Ireland On-Line System Operations
-- End original message --
Philip Thomas Motorola - PEL, M/S M350 2200 W. Broadway M350 Mesa, AZ 85202 rxjs80@email.sps.mot.com (602) 655-3678 (602) 655-2285 (fax)
Hello, I'm hoping that someone can tell me what is causing this error with ndmpcpy:
ndmpcopy mach:/mach5 rocket:/mach5 -sa root:password -da root:password
Connecting to mach. ERROR: comm.c:393 system call (connect): Connection refused Could not open NDMP connection to host mach.
mach is at OS rev 4.3.1D1 - Any ideas??
Thanks, Graham
+--- In a previous state of mind, "Graham C. Knight" grahamk@ast.lmco.com wrote: | | >ndmpcopy mach:/mach5 rocket:/mach5 -sa root:password -da root:password | | Connecting to mach. | ERROR: comm.c:393 system call (connect): Connection refused | Could not open NDMP connection to host mach.
Is the ndmpd process running on mach?
Alexei
Hello, I'm hoping that someone can tell me what is causing this error with ndmpcpy:
ndmpcopy mach:/mach5 rocket:/mach5 -sa root:password -da root:password
Connecting to mach. ERROR: comm.c:393 system call (connect): Connection refused Could not open NDMP connection to host mach.
mach is at OS rev 4.3.1D1 - Any ideas??
login to mach and rocket and run
ndmpd on
This doesn't work from rsh. You might want this in your /etc/rc files.
You may find, as I did, that ndmpcopy hangs after the transfer. Ndmpcopy prints a message saying that one filer is done and it is waiting for the other filer to finish, and it waits forever.
This is some sort of timing issue where the restore finishes before the dump. I think this happens because the dump filer deletes its snapshot and that can take several seconds. Meanwhile the restore filer finishes and exits. So somehow the dump filer gets confused and never exits, so ndmpcopy just sits there. The dump filer still works OK, but some sort of busy loop soaks up the spare CPU, so it sits at 100%. You have to ctrl-c ndmpcopy to terminate, and the dump filer goes back to normal.
NOW BEWARE! Once you test a few times and discover that ndmpcopy tends to hang, you can get into the habit of just interrupting it whenever you see that message. DON'T. If you are copying a very large amount of data, the restore will probably finish well after the dump. This is because the restore does a final pass (after all data has transferred) to set the correct owner and permissions on all directories. If you interrupt ndmpcopy while this is going on, you are left with a bunch of dirs owned by root, with permissions 777 on the restore filer. Can you guess how I found this out?? :-) If the restore finishes after the dump, ndmpcopy exits cleanly. Depending on how many dirs you transferred, you may have to wait a minute or two for that last restore pass to finish.
So never interrrupt ndmpcopy until you see the HALT message from the destination (restore) filer.
Steve Losen scl@virginia.edu phone: 804-924-0640
University of Virginia ITC Unix Support