*This is my second attempt to send this to the list. If it shows up twice, I apologize.
We had a situation last year in which one of the aggregates on our R100 suffered dirty parity in one of its raid groups which forced us to take the filer down for several hours while we ran WAFL_Check on that aggregate. Given this, I have a few questions:
1. What is the likelihood of getting dirty parity? I assume the likelihood goes up dramatically on a box like the R100 given that the weekly RAID scrubs seldom if ever complete.
2. If this had occurred on a cluster, could we take one node down to run the WAFL_Check, while the other node continues to serve data from the other aggregates?
3. How many of you have encountered something like this and decided to go with SyncMirror to protect against such a thing?
Thanks,
--Carl
you could always change your raid scrub duration option to -1. this way it will run to completion.
On 3/7/07, Carl Howell chowell@uwf.edu wrote:
*This is my second attempt to send this to the list. If it shows up twice, I apologize.
We had a situation last year in which one of the aggregates on our R100 suffered dirty parity in one of its raid groups which forced us to take the filer down for several hours while we ran WAFL_Check on that aggregate. Given this, I have a few questions:
What is the likelihood of getting dirty parity? I assume the
likelihood goes up dramatically on a box like the R100 given that the weekly RAID scrubs seldom if ever complete.
If this had occurred on a cluster, could we take one node down to
run the WAFL_Check, while the other node continues to serve data from the other aggregates?
How many of you have encountered something like this and decided to
go with SyncMirror to protect against such a thing?
Thanks,
--Carl
Hello,
as we are talking about scrubing...is there a way to see if the raid.scrub completes?? Is this posted to the messagelog or is there a command for control?
To the point about the cluster-functionality during a wafl_check i would guess that you have to disable the cluster before running this command. This is true for most operations you run during the bootprocess. My idea is, that the wafl_check cannot work correct if the data which has to be checked is changed all the time.
Regards
Jochen
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of tmac Sent: Wednesday, March 07, 2007 10:19 PM To: Carl Howell Cc: toasters@mathworks.com Subject: Re: Dirty Parity
you could always change your raid scrub duration option to -1. this way it will run to completion.
On 3/7/07, Carl Howell chowell@uwf.edu wrote:
*This is my second attempt to send this to the list. If it shows up
twice, I
apologize.
We had a situation last year in which one of the aggregates on our
R100
suffered dirty parity in one of its raid groups which forced us to
take the
filer down for several hours while we ran WAFL_Check on that
aggregate.
Given this, I have a few questions:
What is the likelihood of getting dirty parity? I assume the
likelihood goes up dramatically on a box like the R100 given that the
weekly
RAID scrubs seldom if ever complete.
If this had occurred on a cluster, could we take one node
down to
run the WAFL_Check, while the other node continues to serve data from
the
other aggregates?
How many of you have encountered something like this and
decided to
go with SyncMirror to protect against such a thing?
Thanks,
--Carl
If you want the raid scrubs to run through completion every week, then set the option raid.scrub.duration = -1 It always logs to the /etc/messages file when it completes. If you have not set the optoin, then it will tell you where it left off. When it starts back up, it will tell you where it is picking back up.
On 3/8/07, Willeke, Jochen Jochen.Willeke@wincor-nixdorf.com wrote:
Hello,
as we are talking about scrubing...is there a way to see if the raid.scrub completes?? Is this posted to the messagelog or is there a command for control?
To the point about the cluster-functionality during a wafl_check i would guess that you have to disable the cluster before running this command. This is true for most operations you run during the bootprocess. My idea is, that the wafl_check cannot work correct if the data which has to be checked is changed all the time.
Regards
Jochen
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of tmac Sent: Wednesday, March 07, 2007 10:19 PM To: Carl Howell Cc: toasters@mathworks.com Subject: Re: Dirty Parity
you could always change your raid scrub duration option to -1. this way it will run to completion.
On 3/7/07, Carl Howell chowell@uwf.edu wrote:
*This is my second attempt to send this to the list. If it shows up
twice, I
apologize.
We had a situation last year in which one of the aggregates on our
R100
suffered dirty parity in one of its raid groups which forced us to
take the
filer down for several hours while we ran WAFL_Check on that
aggregate.
Given this, I have a few questions:
What is the likelihood of getting dirty parity? I assume the
likelihood goes up dramatically on a box like the R100 given that the
weekly
RAID scrubs seldom if ever complete.
If this had occurred on a cluster, could we take one node
down to
run the WAFL_Check, while the other node continues to serve data from
the
other aggregates?
How many of you have encountered something like this and
decided to
go with SyncMirror to protect against such a thing?
Thanks,
--Carl
-- --tmac
RedHat Certified Engineer
Principal Consultant, RABA Technologies 240-373-3926 (office) 301-688-4705 (Lab) 214-279-3926 (eFAX)
I guess a better way to state the question is, what is the probability that something will happen at the RAID level, dirty parity for example, that will force you to take your cluster down? The thing that scares me is whether you've purchased a FAS270 or a FAS6070-HA, you will still have to take the cluster down to handle something that has gone wrong at the RAID/WAFL level. Right? You would have to utilize SyncMirror to be protected from something like this.
Jochen, the aggregate in my example would be offline. But I am fairly certain now that you are correct. The cluster would have to be down to fix the aggregate.
--Carl
-----Original Message----- From: tmac [mailto:tmacmd@gmail.com] Sent: Thursday, March 08, 2007 9:21 AM To: Willeke, Jochen Cc: Carl Howell; toasters@mathworks.com Subject: Re: Dirty Parity
If you want the raid scrubs to run through completion every week, then set the option raid.scrub.duration = -1 It always logs to the /etc/messages file when it completes. If you have not set the optoin, then it will tell you where it left off. When it starts back up, it will tell you where it is picking back up.
On 3/8/07, Willeke, Jochen Jochen.Willeke@wincor-nixdorf.com wrote:
Hello,
as we are talking about scrubing...is there a way to see if the raid.scrub completes?? Is this posted to the messagelog or is there a command for control?
To the point about the cluster-functionality during a wafl_check i
would
guess that you have to disable the cluster before running this
command.
This is true for most operations you run during the bootprocess. My
idea
is, that the wafl_check cannot work correct if the data which has to
be
checked is changed all the time.
Regards
Jochen
-----Original Message----- From: owner-toasters@mathworks.com
[mailto:owner-toasters@mathworks.com]
On Behalf Of tmac Sent: Wednesday, March 07, 2007 10:19 PM To: Carl Howell Cc: toasters@mathworks.com Subject: Re: Dirty Parity
you could always change your raid scrub duration option to -1. this way it will run to completion.
On 3/7/07, Carl Howell chowell@uwf.edu wrote:
*This is my second attempt to send this to the list. If it shows up
twice, I
apologize.
We had a situation last year in which one of the aggregates on our
R100
suffered dirty parity in one of its raid groups which forced us to
take the
filer down for several hours while we ran WAFL_Check on that
aggregate.
Given this, I have a few questions:
What is the likelihood of getting dirty parity? I assume
the
likelihood goes up dramatically on a box like the R100 given that
the
weekly
RAID scrubs seldom if ever complete.
If this had occurred on a cluster, could we take one node
down to
run the WAFL_Check, while the other node continues to serve data
from
the
other aggregates?
How many of you have encountered something like this and
decided to
go with SyncMirror to protect against such a thing?
Thanks,
--Carl
-- --tmac
RedHat Certified Engineer
Principal Consultant, RABA Technologies 240-373-3926 (office) 301-688-4705 (Lab) 214-279-3926 (eFAX)