Dirty Parity

List overview All Threads
Download

newer

older

Re: autosupport stops functioning

RE: usermap problem

Carl Howell

7 Mar 2007 7 Mar '07

8:05 p.m.

*This is my second attempt to send this to the list. If it shows up twice, I apologize.

We had a situation last year in which one of the aggregates on our R100 suffered dirty parity in one of its raid groups which forced us to take the filer down for several hours while we ran WAFL_Check on that aggregate. Given this, I have a few questions:

1. What is the likelihood of getting dirty parity? I assume the likelihood goes up dramatically on a box like the R100 given that the weekly RAID scrubs seldom if ever complete.

2. If this had occurred on a cluster, could we take one node down to run the WAFL_Check, while the other node continues to serve data from the other aggregates?

3. How many of you have encountered something like this and decided to go with SyncMirror to protect against such a thing?

Thanks,

--Carl

Attachments:

attachment.html (text/html — 4.6 KB)

Show replies by date

tmac

7 Mar 7 Mar

9:18 p.m.

you could always change your raid scrub duration option to -1. this way it will run to completion.

On 3/7/07, Carl Howell chowell@uwf.edu wrote:

...

*This is my second attempt to send this to the list. If it shows up twice, I apologize.

We had a situation last year in which one of the aggregates on our R100 suffered dirty parity in one of its raid groups which forced us to take the filer down for several hours while we ran WAFL_Check on that aggregate. Given this, I have a few questions:
  What is the likelihood of getting dirty parity? I assume the
likelihood goes up dramatically on a box like the R100 given that the weekly RAID scrubs seldom if ever complete.
  If this had occurred on a cluster, could we take one node down to
run the WAFL_Check, while the other node continues to serve data from the other aggregates?
  How many of you have encountered something like this and decided to
go with SyncMirror to protect against such a thing?

Thanks,

--Carl

-- --tmac RedHat Certified Engineer Principal Consultant, RABA Technologies 240-373-3926 (office) 301-688-4705 (Lab) 214-279-3926 (eFAX)

Willeke, Jochen

8 Mar 8 Mar

1:48 p.m.

Hello,

as we are talking about scrubing...is there a way to see if the raid.scrub completes?? Is this posted to the messagelog or is there a command for control?

To the point about the cluster-functionality during a wafl_check i would guess that you have to disable the cluster before running this command. This is true for most operations you run during the bootprocess. My idea is, that the wafl_check cannot work correct if the data which has to be checked is changed all the time.

Regards

Jochen

-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of tmac Sent: Wednesday, March 07, 2007 10:19 PM To: Carl Howell Cc: toasters@mathworks.com Subject: Re: Dirty Parity

you could always change your raid scrub duration option to -1. this way it will run to completion.

On 3/7/07, Carl Howell chowell@uwf.edu wrote:

...

*This is my second attempt to send this to the list. If it shows up

twice, I

...

apologize.

We had a situation last year in which one of the aggregates on our

R100

...

suffered dirty parity in one of its raid groups which forced us to

take the

...

filer down for several hours while we ran WAFL_Check on that

aggregate.

...

Given this, I have a few questions:
  What is the likelihood of getting dirty parity? I assume the
likelihood goes up dramatically on a box like the R100 given that the

weekly

...

RAID scrubs seldom if ever complete.
  If this had occurred on a cluster, could we take one node

down to

...

run the WAFL_Check, while the other node continues to serve data from

the

...

other aggregates?

  How many of you have encountered something like this and

decided to

...

go with SyncMirror to protect against such a thing?

Thanks,

--Carl

-- --tmac RedHat Certified Engineer Principal Consultant, RABA Technologies 240-373-3926 (office) 301-688-4705 (Lab) 214-279-3926 (eFAX)

tmac

3:21 p.m.

If you want the raid scrubs to run through completion every week, then set the option raid.scrub.duration = -1 It always logs to the /etc/messages file when it completes. If you have not set the optoin, then it will tell you where it left off. When it starts back up, it will tell you where it is picking back up.

On 3/8/07, Willeke, Jochen Jochen.Willeke@wincor-nixdorf.com wrote:

...

Hello,

as we are talking about scrubing...is there a way to see if the raid.scrub completes?? Is this posted to the messagelog or is there a command for control?

To the point about the cluster-functionality during a wafl_check i would guess that you have to disable the cluster before running this command. This is true for most operations you run during the bootprocess. My idea is, that the wafl_check cannot work correct if the data which has to be checked is changed all the time.

Regards

Jochen

-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of tmac Sent: Wednesday, March 07, 2007 10:19 PM To: Carl Howell Cc: toasters@mathworks.com Subject: Re: Dirty Parity

you could always change your raid scrub duration option to -1. this way it will run to completion.

On 3/7/07, Carl Howell chowell@uwf.edu wrote:

...
*This is my second attempt to send this to the list. If it shows up

twice, I

...
apologize.

We had a situation last year in which one of the aggregates on our

R100

...
suffered dirty parity in one of its raid groups which forced us to

take the

...
filer down for several hours while we ran WAFL_Check on that

aggregate.

...
Given this, I have a few questions:
  What is the likelihood of getting dirty parity? I assume the
likelihood goes up dramatically on a box like the R100 given that the
weekly

...
RAID scrubs seldom if ever complete.
  If this had occurred on a cluster, could we take one node
down to

...
run the WAFL_Check, while the other node continues to serve data from

the

...
other aggregates?
  How many of you have encountered something like this and
decided to

...
go with SyncMirror to protect against such a thing?

Thanks,

--Carl

-- --tmac

RedHat Certified Engineer

Principal Consultant, RABA Technologies 240-373-3926 (office) 301-688-4705 (Lab) 214-279-3926 (eFAX)

-- --tmac RedHat Certified Engineer Principal Consultant, RABA Technologies 240-373-3926 (office) 301-688-4705 (Lab) 214-279-3926 (eFAX)

Carl Howell

4:21 p.m.

I guess a better way to state the question is, what is the probability that something will happen at the RAID level, dirty parity for example, that will force you to take your cluster down? The thing that scares me is whether you've purchased a FAS270 or a FAS6070-HA, you will still have to take the cluster down to handle something that has gone wrong at the RAID/WAFL level. Right? You would have to utilize SyncMirror to be protected from something like this.

Jochen, the aggregate in my example would be offline. But I am fairly certain now that you are correct. The cluster would have to be down to fix the aggregate.

--Carl

-----Original Message----- From: tmac [mailto:tmacmd@gmail.com] Sent: Thursday, March 08, 2007 9:21 AM To: Willeke, Jochen Cc: Carl Howell; toasters@mathworks.com Subject: Re: Dirty Parity

On 3/8/07, Willeke, Jochen Jochen.Willeke@wincor-nixdorf.com wrote:

...

Hello,

as we are talking about scrubing...is there a way to see if the raid.scrub completes?? Is this posted to the messagelog or is there a command for control?

To the point about the cluster-functionality during a wafl_check i

would

...

guess that you have to disable the cluster before running this

command.

...

This is true for most operations you run during the bootprocess. My

idea

...

is, that the wafl_check cannot work correct if the data which has to

...

checked is changed all the time.

Regards

Jochen

-----Original Message----- From: owner-toasters@mathworks.com

[mailto:owner-toasters@mathworks.com]

...

On Behalf Of tmac Sent: Wednesday, March 07, 2007 10:19 PM To: Carl Howell Cc: toasters@mathworks.com Subject: Re: Dirty Parity

you could always change your raid scrub duration option to -1. this way it will run to completion.

On 3/7/07, Carl Howell chowell@uwf.edu wrote:

...
*This is my second attempt to send this to the list. If it shows up

twice, I

...
apologize.

We had a situation last year in which one of the aggregates on our

R100

...
suffered dirty parity in one of its raid groups which forced us to

take the

...
filer down for several hours while we ran WAFL_Check on that

aggregate.

...
Given this, I have a few questions:
  What is the likelihood of getting dirty parity? I assume

the

...

...
likelihood goes up dramatically on a box like the R100 given that

the

...

weekly

...
RAID scrubs seldom if ever complete.
  If this had occurred on a cluster, could we take one node
down to

...
run the WAFL_Check, while the other node continues to serve data

from

...

the

...
other aggregates?
  How many of you have encountered something like this and
decided to

...
go with SyncMirror to protect against such a thing?

Thanks,

--Carl

-- --tmac

RedHat Certified Engineer

Principal Consultant, RABA Technologies 240-373-3926 (office) 301-688-4705 (Lab) 214-279-3926 (eFAX)

-- --tmac RedHat Certified Engineer Principal Consultant, RABA Technologies 240-373-3926 (office) 301-688-4705 (Lab) 214-279-3926 (eFAX)

6720

Age (days ago)

6721

Last active (days ago)

toasters@lists.teaparty.net

4 comments

3 participants

tags (0)

participants (3)

Carl Howell
tmac
Willeke, Jochen