RE: raid scrubbing and hot spares

List overview All Threads
Download

newer

older

ndmpcopy port for HP/UX

samba and CIFS on netapp

sirbruce＠ix.netcom.com

28 Oct 1998 28 Oct '98

5:28 a.m.

On 10/27/98 13:48:40 you wrote:

...

Hmm... That's interesting. Is there a CPU usage threshold before the filer becomes "Busy"? Is it 60%, 70% or what?

It's probably "busy" in the sense of heavy disk activity. Which might be associated with high CPU usage but not necessarily. As someone else noted, frequent rdates can do it too. Perhaps running a backup?

I know certain models and certain OSes are less prone to this, but I don't know the details of how Netapp currently views the situation.

Bruce

Show replies by date

Jason D. Kelleher

28 Oct 28 Oct

9:30 a.m.

New subject: raid scrubbing and hot spares

In message 19981027212824341@ix.netcom.com, sirbruce@ix.netcom.com writes:

...

On 10/27/98 13:48:40 you wrote:

...
Hmm... That's interesting. Is there a CPU usage threshold before the filer becomes "Busy"? Is it 60%, 70% or what?

It's probably "busy" in the sense of heavy disk activity. Which might be associated with high CPU usage but not necessarily. As someone else noted, frequent rdates can do it too. Perhaps running a backup?

I know certain models and certain OSes are less prone to this, but I don't know the details of how Netapp currently views the situation.

Bruce

If I recall correctly, NetApp suggests you don't cron rdate's to run on the hour because it's possible to step the clock past the hour and miss a scheduled event such as raid scrubbing or snapshots. I beleive I read this on NOW, not sure.

jason

--- Jason D. Kelleher kelleher@susq.com Susquehanna Partners, G.P. 610.617.2721 (voice) 401 City Line Ave, Suite 220 610.617.2916 (fax) Bala Cynwyd, PA 19004-1122

Philip Thomas

4:28 p.m.

New subject: raid scrubbing and hot spares

...

If I recall correctly, NetApp suggests you don't cron rdate's to run on the hour because it's possible to step the clock past the hour and miss a scheduled event such as raid scrubbing or snapshots. I beleive I read this on NOW, not sure.

Interesting! Wish they would provide an alternate solution. The don't provide NTP and then they discourage us from cron'ing rdate. What do they think? I have nothing else meaningful to do, but sit here and periodically do remoteshell filer1 "hello brain dead, here is your date/time" remoteshell filre2 "hello brain dead, here is your date/time" .... remoteshell filerN "hello brain dead, here is your date/time"

-- Begin original message --

...

From: "Jason D. Kelleher" kelleher@susq.com Date: Wed, 28 Oct 1998 04:30:27 -0500 Subject: Re: raid scrubbing and hot spares To: sirbruce@ix.netcom.com Cc: toasters@mathworks.com

In message 19981027212824341@ix.netcom.com, sirbruce@ix.netcom.com writes:

...
On 10/27/98 13:48:40 you wrote:

...
Hmm... That's interesting. Is there a CPU usage threshold before the filer becomes "Busy"? Is it 60%, 70% or what?

It's probably "busy" in the sense of heavy disk activity. Which might be associated with high CPU usage but not necessarily. As someone else noted, frequent rdates can do it too. Perhaps running a backup?

I know certain models and certain OSes are less prone to this, but I don't know the details of how Netapp currently views the situation.

Bruce
If I recall correctly, NetApp suggests you don't cron rdate's to
run on the hour because it's possible to step the clock past the
hour and miss a scheduled event such as raid scrubbing or
snapshots.  I beleive I read this on NOW, not sure.

jason
Jason D. Kelleher kelleher@susq.com Susquehanna Partners, G.P. 610.617.2721 (voice) 401 City Line Ave, Suite 220 610.617.2916 (fax) Bala Cynwyd, PA 19004-1122

-- End original message --

Philip Thomas Motorola - PEL, M/S M350 2200 W. Broadway M350 Mesa, AZ 85202 rxjs80@email.sps.mot.com (602) 655-3678 (602) 655-2285 (fax)

Michael Cerda

5 p.m.

New subject: raid scrubbing and hot spares

...

...
If I recall correctly, NetApp suggests you don't cron rdate's to run on the hour because it's possible to step the clock past the hour and miss a scheduled event such as raid scrubbing or snapshots.

...

The don't provide NTP and then they discourage us from cron'ing rdate. What do they think?

I think the issue is when you run rdate. Just schedule rdate to run 37 minutes after the hour, or 12 minutes after the hour. If those involve a crossing of the hour boundary then you have other problems.

-Michael Cerda

Nick Hilliard

7:17 p.m.

New subject: raid scrubbing and hot spares

...

The don't provide NTP and then they discourage us from cron'ing rdate. What do they think? I have nothing else meaningful to do, but sit here and periodically do

Hey - chill out. NTP is on the way. We all know that rsh/cron/rdate is brain-damaged.

Besides, they don't discourage us from cron'ing rdate. It's just pretty dumb to run any time-independent process at the top of the hour, let alone rdate.

Nick Hilliard Ireland On-Line System Operations

Philip Thomas

7:45 p.m.

New subject: raid scrubbing and hot spares

...

...
The don't provide NTP and then they discourage us from cron'ing rdate. What do they think? I have nothing else meaningful to do, but sit here and periodically do

...

Hey - chill out. NTP is on the way. We all know that rsh/cron/rdate is brain-damaged.

"chill-out"?? I am 'frozen', waiting (since Feb 1994) for this elusive NTP from NetApp. :-)

-- Begin original message --

...

From: Nick Hilliard nick@iol.ie Date: Wed, 28 Oct 1998 19:17:00 +0000 (GMT) Subject: Re: raid scrubbing and hot spares To: thomas@act.sps.mot.com (Philip Thomas) Cc: toasters@mathworks.com

...
The don't provide NTP and then they discourage us from cron'ing rdate. What do they think? I have nothing else meaningful to do, but sit here and periodically do

Hey - chill out. NTP is on the way. We all know that rsh/cron/rdate is brain-damaged.

Besides, they don't discourage us from cron'ing rdate. It's just pretty dumb to run any time-independent process at the top of the hour, let alone rdate.

Nick Hilliard Ireland On-Line System Operations

-- End original message --

Philip Thomas Motorola - PEL, M/S M350 2200 W. Broadway M350 Mesa, AZ 85202 rxjs80@email.sps.mot.com (602) 655-3678 (602) 655-2285 (fax)

Graham C. Knight

8:45 p.m.

New subject: ndmpcopy question

Hello, I'm hoping that someone can tell me what is causing this error with ndmpcpy:

...

ndmpcopy mach:/mach5 rocket:/mach5 -sa root:password -da root:password

Connecting to mach. ERROR: comm.c:393 system call (connect): Connection refused Could not open NDMP connection to host mach.

mach is at OS rev 4.3.1D1 - Any ideas??

Thanks, Graham

alexei＠cimedia.com

9:54 p.m.

New subject: ndmpcopy question

+--- In a previous state of mind, "Graham C. Knight" grahamk@ast.lmco.com wrote: | | >ndmpcopy mach:/mach5 rocket:/mach5 -sa root:password -da root:password | | Connecting to mach. | ERROR: comm.c:393 system call (connect): Connection refused | Could not open NDMP connection to host mach.

Is the ndmpd process running on mach?

Alexei

Stephen C. Losen

10:49 p.m.

New subject: ndmpcopy question

...

Hello, I'm hoping that someone can tell me what is causing this error with ndmpcpy:

...
ndmpcopy mach:/mach5 rocket:/mach5 -sa root:password -da root:password

Connecting to mach. ERROR: comm.c:393 system call (connect): Connection refused Could not open NDMP connection to host mach.

mach is at OS rev 4.3.1D1 - Any ideas??

ndmpd on

This doesn't work from rsh. You might want this in your /etc/rc files.

You may find, as I did, that ndmpcopy hangs after the transfer. Ndmpcopy prints a message saying that one filer is done and it is waiting for the other filer to finish, and it waits forever.

This is some sort of timing issue where the restore finishes before the dump. I think this happens because the dump filer deletes its snapshot and that can take several seconds. Meanwhile the restore filer finishes and exits. So somehow the dump filer gets confused and never exits, so ndmpcopy just sits there. The dump filer still works OK, but some sort of busy loop soaks up the spare CPU, so it sits at 100%. You have to ctrl-c ndmpcopy to terminate, and the dump filer goes back to normal.

NOW BEWARE! Once you test a few times and discover that ndmpcopy tends to hang, you can get into the habit of just interrupting it whenever you see that message. DON'T. If you are copying a very large amount of data, the restore will probably finish well after the dump. This is because the restore does a final pass (after all data has transferred) to set the correct owner and permissions on all directories. If you interrupt ndmpcopy while this is going on, you are left with a bunch of dirs owned by root, with permissions 777 on the restore filer. Can you guess how I found this out?? :-) If the restore finishes after the dump, ndmpcopy exits cleanly. Depending on how many dirs you transferred, you may have to wait a minute or two for that last restore pass to finish.

So never interrrupt ndmpcopy until you see the HALT message from the destination (restore) filer.

Steve Losen scl@virginia.edu phone: 804-924-0640

University of Virginia ITC Unix Support

10023

Age (days ago)

10023

Last active (days ago)

toasters@lists.teaparty.net

8 comments

8 participants

tags (0)

participants (8)

alexei＠cimedia.com
Graham C. Knight
Jason D. Kelleher
Michael Cerda
Nick Hilliard
Philip Thomas
sirbruce＠ix.netcom.com
Stephen C. Losen