from performance vs rg size to new topic: rg size uniformity within the aggregate

List overview All Threads
Download

newer

older

Clarify ndmpcopy behavior.

mixing disks in aggr

Cotton, Randall

2 Nov 2011 2 Nov '11

12:53 a.m.

Unfortunately, I've not been enlightened as to performance numbers for varying small aggregate sizes with one raid group, so I guess I'll have to be content with shooting in the dark by starting out small and growing from there if the performance is too lousy. Regardless, may comments were still helpful and they are much appreciated.

But now to a tangentially related subject:

...

From recent posts, one might be led to believe that it really doesn't

matter much at all what the sizes of your raid groups are within an aggregate. One might get the idea that an aggregate with, say, 3 raid groups of size 6, 12 and 18 wouldn't perform much differently than one with 3 identical raid groups of size 12 because what matters far more than anything else is the data disk count of an aggregate:

For instance:

"...there is no reason one 4+2 and two 16+2s cant perform quite well. There is no unreasonable penalty to doing so ... if the right tools are used to manage it."

"Performance of an aggregate is the total sum of spindles across the entire aggregate."

And

"Forget parity as a limitation to system performance"

However, NetApp's recommendations on raid group sizing within the aggregate might lead one to believe somewhat the opposite:

...

From http://communities.netapp.com/thread/1587 :

"the recommended sizing approach is to establish a RAID group size that ... achieves an even RAID group layout (all RAID groups contain the same number of drives). If multiple RAID group sizes achieve an even RAID group layout, NetApp recommends using the higher RAID group size value within the range."

and

"Drive deficiencies should be distributed across RAID groups so that no single RAID group is deficient more than a single drive."

That is, NetApp took the time and trouble to say:

1. RAID group sizes within an aggregate shouldn't vary by more than 1 disk 2. Larger RAID group sizes are better (up to a point)

Which runs rather strongly against the grain of what's been said recently in this forum. Can anyone shed more light on this topic?

For instance, what are the drawbacks (and how bad are they) to, say, adding a 6-disk raid group to an aggregate consisting of one 16-disk raid group (instead of spilling for way more disk space than you might need by forking over for the full 16 disks necessary to even things out?).

Thanks Randall

Show replies by date

steve klise

2 Nov 2 Nov

1:19 a.m.

New subject: from performance vs rg size to new topic: rg size uniformity within the aggregate

You will want to read about aggr re-allocate, vol realloate. There was a GREAT blog out there somwhere with cool pictures, and levels of data as they are laid down, so you can start off with a RG of 16, start with 6, and add (at min) 3 disks at a time. Run the re-allocate to rebalance, and you are there..

I think thats how it is... Bottom line, its best to add as many as you can.. repeat, rinse.. Repeat if still dirty..

...

Subject: from performance vs rg size to new topic: rg size uniformity within the aggregate Date: Tue, 1 Nov 2011 19:53:59 -0500 From: recotton@uif.uillinois.edu To: toasters@teaparty.net

Unfortunately, I've not been enlightened as to performance numbers for varying small aggregate sizes with one raid group, so I guess I'll have to be content with shooting in the dark by starting out small and growing from there if the performance is too lousy. Regardless, may comments were still helpful and they are much appreciated.

But now to a tangentially related subject:

...
From recent posts, one might be led to believe that it really doesn't

matter much at all what the sizes of your raid groups are within an aggregate. One might get the idea that an aggregate with, say, 3 raid groups of size 6, 12 and 18 wouldn't perform much differently than one with 3 identical raid groups of size 12 because what matters far more than anything else is the data disk count of an aggregate:

For instance:

"...there is no reason one 4+2 and two 16+2s cant perform quite well. There is no unreasonable penalty to doing so ... if the right tools are used to manage it."

"Performance of an aggregate is the total sum of spindles across the entire aggregate."

And

"Forget parity as a limitation to system performance"

However, NetApp's recommendations on raid group sizing within the aggregate might lead one to believe somewhat the opposite:

...
From http://communities.netapp.com/thread/1587 :

"the recommended sizing approach is to establish a RAID group size that ... achieves an even RAID group layout (all RAID groups contain the same number of drives). If multiple RAID group sizes achieve an even RAID group layout, NetApp recommends using the higher RAID group size value within the range."

and

"Drive deficiencies should be distributed across RAID groups so that no single RAID group is deficient more than a single drive."

That is, NetApp took the time and trouble to say:

RAID group sizes within an aggregate shouldn't vary by more than 1

disk 2. Larger RAID group sizes are better (up to a point)

Which runs rather strongly against the grain of what's been said recently in this forum. Can anyone shed more light on this topic?

For instance, what are the drawbacks (and how bad are they) to, say, adding a 6-disk raid group to an aggregate consisting of one 16-disk raid group (instead of spilling for way more disk space than you might need by forking over for the full 16 disks necessary to even things out?).

Thanks Randall

Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters

Adam Levin

1:28 a.m.

New subject: from performance vs rg size to new topic: rg size uniformity within the aggregate

On Tue, 1 Nov 2011, Cotton, Randall wrote:

...

Unfortunately, I've not been enlightened as to performance numbers for varying small aggregate sizes with one raid group, so I guess I'll have

Yeah, sorry about that, but it appears we just don't have that information available to us (and NetApp reps don't seem to want to answer :) ).

...

But now to a tangentially related subject:

And an interesting one.

...

"...there is no reason one 4+2 and two 16+2s cant perform quite well. There is no unreasonable penalty to doing so ... if the right tools are used to manage it."

I was somewhat surprised by that comment myself.

...

That is, NetApp took the time and trouble to say:

RAID group sizes within an aggregate shouldn't vary by more than 1

disk 2. Larger RAID group sizes are better (up to a point)

I would certainly stick with NetApp's best practice when it comes to this.

Part of the issue here may be theoretical vs. real world. After all, in theory 'theory' and 'practice' are the same, but in practice they aren't.

It may be that in theory, different rg sizes may not make much difference. However, it's possible that the way NetApp implements rgs within an aggregate that it makes a difference.

...

For instance, what are the drawbacks (and how bad are they) to, say, adding a 6-disk raid group to an aggregate consisting of one 16-disk raid group (instead of spilling for way more disk space than you might need by forking over for the full 16 disks necessary to even things out?).

One thing to consider is starting out with a large rg and adding a small one. This also matters if you add a 16-disk rg that's not full yet (you can create a 16 disk rg with 6 disks, and you'll get 2 parity and 4 data until you add more available data disks).

The problem here is that if the 16 disk rg is half full and you add a 6 disk rg, the system will begin filling the 6 disk rg to a proportional size. So, you used to have 14 data disks working for you, and now you have 4 until they're appropriately full. It's certainly possible to force a redistribution of data, which will take some time.

The other problem is that moving forward if you have optimized your writes for the 16 disk rg stripe size, some rgs are going to be optimized and some aren't, which could conceivably affect performance.

Again, I would go back to NetApp and ask the question outright. They know their product the best, and they want you to speak well of them and buy more product, so they'll want you to get the performance you're looking for.

Hope this helps, -Adam

Jeff Mohler

3:21 a.m.

New subject: from performance vs rg size to new topic: rg size uniformity within the aggregate

"Part of the issue here may be theoretical vs. real world. After all, in theory 'theory' and 'practice' are the same, but in practice they aren't." ---

But..they -are-.

I made those statements, and I will stand by them as a relatively knowledgeably person on these matters.

Build yourself a system in your lab (whomever is able) and watch via statit the distribution of IO across the aggregate FAR above the raid-group level.

20+2 will be within all reasonable margins of 2(10+2), or 4(5+2).

As you increase the # of raid groups, there is a minor cost in managing MORE protection zones (raid groups) but the IO is aggregate wide.

Best practices guide best principles, and they're safe, very safe..but "you have to understand *why things work on a starship*" (fun quote from a fun movie)

"The other problem is that moving forward if you have optimized your writes for the 16 disk rg stripe size, some rgs are going to be optimized and some aren't, which could conceivably affect performance." --- Im trying to understand this statement as well. Writes are optimized for the full stripe of the aggregate. RG size, again, has nothing to do with it as a "stripe".

Raid groups are zones of data protection.

On Tue, Nov 1, 2011 at 6:28 PM, Adam Levin levins@westnet.com wrote:

...

On Tue, 1 Nov 2011, Cotton, Randall wrote:

...
Unfortunately, I've not been enlightened as to performance numbers for varying small aggregate sizes with one raid group, so I guess I'll have

Yeah, sorry about that, but it appears we just don't have that information available to us (and NetApp reps don't seem to want to answer :) ).

...
But now to a tangentially related subject:

And an interesting one.

...
"...there is no reason one 4+2 and two 16+2s cant perform quite well. There is no unreasonable penalty to doing so ... if the right tools are used to manage it."

I was somewhat surprised by that comment myself.

...
That is, NetApp took the time and trouble to say:

RAID group sizes within an aggregate shouldn't vary by more than 1

disk 2. Larger RAID group sizes are better (up to a point)

I would certainly stick with NetApp's best practice when it comes to this.

Part of the issue here may be theoretical vs. real world. After all, in theory 'theory' and 'practice' are the same, but in practice they aren't.

It may be that in theory, different rg sizes may not make much difference. However, it's possible that the way NetApp implements rgs within an aggregate that it makes a difference.

...
For instance, what are the drawbacks (and how bad are they) to, say, adding a 6-disk raid group to an aggregate consisting of one 16-disk raid group (instead of spilling for way more disk space than you might need by forking over for the full 16 disks necessary to even things out?).

One thing to consider is starting out with a large rg and adding a small one. This also matters if you add a 16-disk rg that's not full yet (you can create a 16 disk rg with 6 disks, and you'll get 2 parity and 4 data until you add more available data disks).

The problem here is that if the 16 disk rg is half full and you add a 6 disk rg, the system will begin filling the 6 disk rg to a proportional size. So, you used to have 14 data disks working for you, and now you have 4 until they're appropriately full. It's certainly possible to force a redistribution of data, which will take some time.

The other problem is that moving forward if you have optimized your writes for the 16 disk rg stripe size, some rgs are going to be optimized and some aren't, which could conceivably affect performance.

Again, I would go back to NetApp and ask the question outright. They know their product the best, and they want you to speak well of them and buy more product, so they'll want you to get the performance you're looking for.

Hope this helps, -Adam

Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters

-- --- Gustatus Similis Pullus

Cotton, Randall

3:41 a.m.

New subject: from performance vs rg size to new topic: rg size uniformitywithin the aggregate

...

20+2 will be within all reasonable margins of 2(10+2), or 4(5+2).

How about, say, 17+2 with 3+2?

Thanks,

Randall

From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Jeff Mohler Sent: Tuesday, November 01, 2011 10:21 PM To: Adam Levin Cc: toasters@teaparty.net Subject: Re: from performance vs rg size to new topic: rg size uniformitywithin the aggregate

"Part of the issue here may be theoretical vs. real world. After all, in theory 'theory' and 'practice' are the same, but in practice they aren't." ---

But..they -are-.

I made those statements, and I will stand by them as a relatively knowledgeably person on these matters.

Build yourself a system in your lab (whomever is able) and watch via statit the distribution of IO across the aggregate FAR above the raid-group level.

20+2 will be within all reasonable margins of 2(10+2), or 4(5+2).

As you increase the # of raid groups, there is a minor cost in managing MORE protection zones (raid groups) but the IO is aggregate wide.

Best practices guide best principles, and they're safe, very safe..but "you have to understand why things work on a starship" (fun quote from a fun movie)

Raid groups are zones of data protection.

On Tue, Nov 1, 2011 at 6:28 PM, Adam Levin levins@westnet.com wrote:

On Tue, 1 Nov 2011, Cotton, Randall wrote:

...

Unfortunately, I've not been enlightened as to performance numbers for varying small aggregate sizes with one raid group, so I guess I'll

have

Yeah, sorry about that, but it appears we just don't have that information available to us (and NetApp reps don't seem to want to answer :) ).

...

But now to a tangentially related subject:

And an interesting one.

...

"...there is no reason one 4+2 and two 16+2s cant perform quite well. There is no unreasonable penalty to doing so ... if the right tools

are

...

used to manage it."

I was somewhat surprised by that comment myself.

...

That is, NetApp took the time and trouble to say:

RAID group sizes within an aggregate shouldn't vary by more than 1

disk 2. Larger RAID group sizes are better (up to a point)

I would certainly stick with NetApp's best practice when it comes to this.

Part of the issue here may be theoretical vs. real world. After all, in theory 'theory' and 'practice' are the same, but in practice they aren't.

It may be that in theory, different rg sizes may not make much difference. However, it's possible that the way NetApp implements rgs within an aggregate that it makes a difference.

...

For instance, what are the drawbacks (and how bad are they) to, say, adding a 6-disk raid group to an aggregate consisting of one 16-disk raid group (instead of spilling for way more disk space than you might need by forking over for the full 16 disks necessary to even things out?).

Hope this helps, -Adam

_______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters

-- --- Gustatus Similis Pullus

5:18 a.m.

New subject: from performance vs rg size to new topic: rg size uniformitywithin the aggregate

If created at the same time or reallocated after an add, all data drives will share identical IO patterns.

On Nov 1, 2011, at 8:41 PM, "Cotton, Randall" recotton@uif.uillinois.edu wrote:

...

...
20+2 will be within all reasonable margins of 2(10+2), or 4(5+2).

How about, say, 17+2 with 3+2?

Thanks, Randall

From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Jeff Mohler Sent: Tuesday, November 01, 2011 10:21 PM To: Adam Levin Cc: toasters@teaparty.net Subject: Re: from performance vs rg size to new topic: rg size uniformitywithin the aggregate

"Part of the issue here may be theoretical vs. real world. After all, in theory 'theory' and 'practice' are the same, but in practice they aren't."

But..they -are-.

I made those statements, and I will stand by them as a relatively knowledgeably person on these matters.

Build yourself a system in your lab (whomever is able) and watch via statit the distribution of IO across the aggregate FAR above the raid-group level.

20+2 will be within all reasonable margins of 2(10+2), or 4(5+2).

As you increase the # of raid groups, there is a minor cost in managing MORE protection zones (raid groups) but the IO is aggregate wide.

Best practices guide best principles, and they're safe, very safe..but "you have to understand why things work on a starship" (fun quote from a fun movie)

"The other problem is that moving forward if you have optimized your writes for the 16 disk rg stripe size, some rgs are going to be optimized and some aren't, which could conceivably affect performance."

Im trying to understand this statement as well. Writes are optimized for the full stripe of the aggregate. RG size, again, has nothing to do with it as a "stripe".

Raid groups are zones of data protection.

On Tue, Nov 1, 2011 at 6:28 PM, Adam Levin levins@westnet.com wrote:

On Tue, 1 Nov 2011, Cotton, Randall wrote:

...
Unfortunately, I've not been enlightened as to performance numbers for varying small aggregate sizes with one raid group, so I guess I'll have

Yeah, sorry about that, but it appears we just don't have that information available to us (and NetApp reps don't seem to want to answer :) ).

...
But now to a tangentially related subject:

And an interesting one.

...
"...there is no reason one 4+2 and two 16+2s cant perform quite well. There is no unreasonable penalty to doing so ... if the right tools are used to manage it."

I was somewhat surprised by that comment myself.

...
That is, NetApp took the time and trouble to say:

RAID group sizes within an aggregate shouldn't vary by more than 1

disk 2. Larger RAID group sizes are better (up to a point)

I would certainly stick with NetApp's best practice when it comes to this.

Part of the issue here may be theoretical vs. real world. After all, in theory 'theory' and 'practice' are the same, but in practice they aren't.

It may be that in theory, different rg sizes may not make much difference. However, it's possible that the way NetApp implements rgs within an aggregate that it makes a difference.

...
For instance, what are the drawbacks (and how bad are they) to, say, adding a 6-disk raid group to an aggregate consisting of one 16-disk raid group (instead of spilling for way more disk space than you might need by forking over for the full 16 disks necessary to even things out?).

One thing to consider is starting out with a large rg and adding a small one. This also matters if you add a 16-disk rg that's not full yet (you can create a 16 disk rg with 6 disks, and you'll get 2 parity and 4 data until you add more available data disks).

The problem here is that if the 16 disk rg is half full and you add a 6 disk rg, the system will begin filling the 6 disk rg to a proportional size. So, you used to have 14 data disks working for you, and now you have 4 until they're appropriately full. It's certainly possible to force a redistribution of data, which will take some time.

The other problem is that moving forward if you have optimized your writes for the 16 disk rg stripe size, some rgs are going to be optimized and some aren't, which could conceivably affect performance.

Again, I would go back to NetApp and ask the question outright. They know their product the best, and they want you to speak well of them and buy more product, so they'll want you to get the performance you're looking for.

Hope this helps, -Adam

Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters

--

Gustatus Similis Pullus _______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters

Davin Milun

3:48 a.m.

New subject: from performance vs rg size to new topic: rg size uniformity within the aggregate

On Tue, Nov 1, 2011 at 11:21 PM, Jeff Mohler speedtoys.racing@gmail.comwrote:

...

...

"The other problem is that moving forward if you have optimized your writes for the 16 disk rg stripe size, some rgs are going to be optimized and some aren't, which could conceivably affect performance."

Im trying to understand this statement as well. Writes are optimized for the full stripe of the aggregate. RG size, again, has nothing to do with it as a "stripe".

Raid groups are zones of data protection.

Actually, that does NOT match my understanding. And does not match what a filer with a multi-rg aggr shows me.

Stripes do appear to be related only to raidgroup size, NOT to total aggr size. Stripes are what needs to be calculated in order to write parity. And parity is a rg-level concept, not an aggr-level one.

I have a filer one one aggr being 1*(8+2) + 1*(9+2). And another aggr being 1*(14+2). (And a small root aggr.)

If I run statit, and look at the stripe data near the top, this is what is output: 5.18 1 blocks per stripe size 1 24.65 1 blocks per stripe size 8 15.32 2 blocks per stripe size 8 13.64 3 blocks per stripe size 8 9.63 4 blocks per stripe size 8 7.73 5 blocks per stripe size 8 5.62 6 blocks per stripe size 8 2.41 7 blocks per stripe size 8 0.58 8 blocks per stripe size 8 27.13 1 blocks per stripe size 9 22.83 2 blocks per stripe size 9 16.56 3 blocks per stripe size 9 12.98 4 blocks per stripe size 9 9.55 5 blocks per stripe size 9 5.32 6 blocks per stripe size 9 3.72 7 blocks per stripe size 9 0.80 8 blocks per stripe size 9 0.66 9 blocks per stripe size 9 3.14 1 blocks per stripe size 14 2.19 2 blocks per stripe size 14 3.87 3 blocks per stripe size 14 3.43 4 blocks per stripe size 14 3.57 5 blocks per stripe size 14 3.43 6 blocks per stripe size 14 3.87 7 blocks per stripe size 14 2.63 8 blocks per stripe size 14 3.28 9 blocks per stripe size 14 3.06 10 blocks per stripe size 14 5.25 11 blocks per stripe size 14 4.74 12 blocks per stripe size 14 6.27 13 blocks per stripe size 14 5.03 14 blocks per stripe size 14

Note that the largest stripe shown is 14, from the 14+2 raidgroup. And it shows stripes of size 8 and size 9, from the 8+2 and 9+2 raidgroups. But there is no sign of a stripe that is size 17, or any other combination of the two raidgroups in the larger aggr.

Davin.

Adam Levin

12:05 p.m.

New subject: from performance vs rg size to new topic: rg size uniformity within the aggregate

On Tue, 1 Nov 2011, Jeff Mohler wrote:

...

"Part of the issue here may be theoretical vs. real world. After all, in theory 'theory' and 'practice' are the same, but in practice they aren't."

But..they -are-.

Not in my experience, but certainly your mileage may vary.

...

I made those statements, and I will stand by them as a relatively knowledgeably person on these matters.

Build yourself a system in your lab (whomever is able) and watch via statit the distribution of IO across the aggregate FAR above the raid-group level.

20+2 will be within all reasonable margins of 2(10+2), or 4(5+2).

Ah, but on this I agree, and have all along. I don't think there's going to be a large difference between an aggr made of 24 disk rgs, 16 disk rgs, or 8 disk rgs.

However, if you *mix* rg sizes within the aggr, you are causing the filer to do more work.

...

"The other problem is that moving forward if you have optimized your writes for the 16 disk rg stripe size, some rgs are going to be optimized and some aren't, which could conceivably affect performance."

Im trying to understand this statement as well. Writes are optimized for the full stripe of the aggregate. RG size, again, has nothing to do with it as a "stripe".

Raid groups are zones of data protection.

Yeas, rgs are zones of data protection (really they are spindle protection in a NetApp -- basically the same thing, but again, theory vs. practice).

However, as Davin pointed out, there are different stripes in play here. RAID-4 and RAID-DP, by their nature, have stripes. They have to in order to calculate the parity data. So each rg has a stripe width.

An aggregate is, at the most basic level, just raid-0. We usually use the term "wide stripe" to talk about a stripe of stripes. We've created protection at the raid group level, and now we're gluing those raid groups together to take advantage of better i/o in a large number of spindles. That's why I create as large an aggregate as I can and let the filer figure out where the hot spots are. It also minimizes wasted space.

Anyway, that wide stripe works best when all of the calculations are the same. However, if you use different-sized raid groups within the aggregate, the filer has to do more work. It may be minimal work, and the filer can handle it, but these things start to add up if you're looking for performance. It may be that a 4 disk rg performs only a few percent slower than a 16 disk rg. Now combine that with a 12 disk and 6 disk rg and add them all to an aggregate where the filer has to do a few percent more calculations to do the wide striping properly...

Man, Randall, you sure ask good questions. :)

-Adam

Jeff Mohler

3 Nov 3 Nov

4:56 p.m.

New subject: from performance vs rg size to new topic: rg size uniformity within the aggregate

...

Not in my experience, but certainly your mileage may vary.

--- After thousands of perfstats, still looking for that exampl.e

...

However, if you *mix* rg sizes within the aggr, you are causing the filer to do more work.

--- Where is the more work at? Yes..if this causes you to have -1- extra raid group on the box, there is minimally measurable overhead to manage additional raid groups..no surprise there.

...

"The other problem is that moving forward if you have optimized your

writes

...

for the 16 disk rg stripe size, some rgs are going to be optimized and some aren't, which could conceivably affect performance."

Im trying to understand this statement as well. Writes are optimized for the full stripe of the aggregate. RG size, again, has nothing to do with it as a "stripe".

Raid groups are zones of data protection.

Yeas, rgs are zones of data protection (really they are spindle protection

...

in a NetApp -- basically the same thing, but again, theory vs. practice).

--- I think we can agree that data = spindles and not construct a difference of opinion over it. Thats what I meant. :)

...

However, as Davin pointed out, there are different stripes in play here. RAID-4 and RAID-DP, by their nature, have stripes. They have to in order to calculate the parity data. So each rg has a stripe width.

--- Yes. But the application cant see it.

An aggregate is, at the most basic level, just raid-0. We usually use the

...

term "wide stripe" to talk about a stripe of stripes. We've created protection at the raid group level, and now we're gluing those raid groups together to take advantage of better i/o in a large number of spindles. That's why I create as large an aggregate as I can and let the filer figure out where the hot spots are. It also minimizes wasted space.

--- Yes.

...

Anyway, that wide stripe works best when all of the calculations are the same. However, if you use different-sized raid groups within the aggregate, the filer has to do more work.

--- I'll accept that. But it doesnt manifest itself meaningfully in the USER space. The CP is outside the view of the user, the read IO has no stripe construction overhead..so as Ive been saying...the BP is NOT to do it, but in the vast majority of instances where people have done this by mistake/etc, its not a huge problem.

The aggregate size by spindle and type is the performance envelope.

What goes on in a serialized domain after the client has had the write ACK'd, isnt felt by the user..short of getting to the point of overrunning the controller with write IO itself...then we can talk the shavings of performance left on the table. IMHO, RG count on a system has the same overhead, which would fire up the debate over small/mid/huge size RG's to recapture the overhead in stripe calculation(s). Which IMHO, is more a business decision over risk/reward.

...

It may be minimal work, and the filer can handle it, but these things start to add up if you're looking for performance. It may be that a 4 disk rg performs only a few percent slower than a 16 disk rg. Now combine that with a 12 disk and 6 disk rg and add them all to an aggregate where the filer has to do a few percent more calculations to do the wide striping properly...

Man, Randall, you sure ask good questions. :)

--- And at this level...were in violent agreement. :) :)

Ive never chastised (lightly or otherwise) a customer with RG's that dont balance out perfectly...we discuss it, but if a disk purchase is not in store to solve a perf issue Im consulting over, I dont suggest that capital and effort be taken to resolve it at that time. It's rare, IMHO, to see that overhead manifest itself in the user space. You have to exhaust the CP engine to do it, then even so, you're likely still left with a sizing challenge to resolve. (Legacy HW servicing new user loads, etc)

Later on..a few disk adds and a few injections of reallocate to the patient are in good order.

Adam Levin

5:50 p.m.

New subject: from performance vs rg size to new topic: rg size uniformity within the aggregate

On Thu, 3 Nov 2011, Jeff Mohler wrote:

...

And at this level...were in violent agreement. :) :)

Heh, indeed! :)

...

Ive never chastised (lightly or otherwise) a customer with RG's that dont balance out perfectly...we discuss it, but if a disk purchase is not in store to solve a perf issue Im consulting over, I dont suggest that capital and effort be taken to resolve it at that time. It's rare, IMHO, to see that overhead manifest itself in the user space. You have to exhaust the CP engine to do it, then even so, you're likely still left with a sizing challenge to resolve. (Legacy HW servicing new user loads, etc)

I think that's a reasonable position to take.

...

Later on..a few disk adds and a few injections of reallocate to the patient are in good order.

Yup.

-Adam

Petar Forai

4 Nov 4 Nov

12:26 p.m.

New subject: Full stride writes/SAS balancing was Re: from performance vs rg size to new topic: rg size uniformity within the aggregate

Hey Guys!

To piggyback on the RG size vs performance discussions...

Can anyone explain why the RG size as such is not _that_ dominant to aggregate (write) performance? My current understanding of WAFL with respect to writes is, that due to the write anywhere mantra it is (most of the time?) doing what on RAID-6 (in SNIA definitions) should be the equivalent of a full stride write.

If this is the case how come there is no requirement for the RG size ( N + P + Q) to be even? I'm sure I'm missing something here.

In the last discussions of RG size I've never seen anyone mention aggr/RG layout over SAS stacks/shelfs which seems a bitt odd to me. From the sequential read POV it should be crucial to balance back end SAS ports but no mentioning of that on any NetApp TR I've come across. There should be some kind of automatic SAS path load balancing going on (on a per disk basis?) automagically but from my current observations ONTAP doesn't care about disk selection/placement when it comes to aggr construction.

NetApp tells you[1] that optimal RG size shouldn't differ by more than +- 1 within an aggregate - does this also apply to large aggregates composed of say:

6 RGs of size 12 plus another 4 RGs each holding 13 disks?

On a somewhat related note, I've not come across any considerations of different SAS HBA capacities - just as an example:

A FAS3240A system that I'm currently looking at, with the IOXM option installed - each head has one dual-port on-board 6Gbps SAS HBA (ports 0a and 0b) and an additional _quad-port_ 6Gbps SAS HBA in the IOXM.

According to the platform documentation[2], in particular the platform block diagram, the IOXM is only 32xPCI-E 1.0 lanes switch connected to the PCI-E 2.0 (32x?) switch (that is connected to the PCI-E root complex) shared by the onboard SAS HBA and the FC HBA. So you have a quad port 6 Gbps HBA hanging on 8x PCI-1.0 in the IOXM which is shared somewhat with the and one dual port 6Gbps 8x PCI-E 2.0 HBA. How do you balance this - if at all :) ?

I'm asking because a client's system currently has 2x DS4324 populated with 48x 1TB drives and another 2x DS4324 with 48x 1TB drives has been purchased and installed so that each 2xDS4324 are on their own stack. The current configuration is one aggregate with RG size of 15 and 3 hot spares. The system is running 8.0.2P3.

According to [3] the maximum aggr size on pre 8.1 60 disks with rg size 17 make for the optimal layout of a single 1T aggr on a FAS3240A. For 8.1 I'd probably go with RG size 18 and 5 of them leaving 2 spares per controller or so.

As 8.1 is basically just around the corner I'd like to have aggr designed with the 8.1 limits in the back of my head.

I would probably go now with a new aggr from the new shelfs with RG size 17 do a aggr copy destroy the old one and add a few disks from the old one until 8.1 is running on the system.

Any opinions?

Thanks, P

— [1] TR3838 (NDA) or check [3] about most important content of TR3838 [2] FAS/V3200 Platform Report [3] http://communities.netapp.com/message/52632#52632

On 03.11.2011, at 18:50, Adam Levin wrote:

...

On Thu, 3 Nov 2011, Jeff Mohler wrote:

...
And at this level...were in violent agreement. :) :)

Heh, indeed! :)

...
Ive never chastised (lightly or otherwise) a customer with RG's that dont balance out perfectly...we discuss it, but if a disk purchase is not in store to solve a perf issue Im consulting over, I dont suggest that capital and effort be taken to resolve it at that time. It's rare, IMHO, to see that overhead manifest itself in the user space. You have to exhaust the CP engine to do it, then even so, you're likely still left with a sizing challenge to resolve. (Legacy HW servicing new user loads, etc)

I think that's a reasonable position to take.

...
Later on..a few disk adds and a few injections of reallocate to the patient are in good order.

Yup.

-Adam

Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters

Petar Forai

PGP-Key-Fingerprint: 4D15 F20B 6BB0 F68D 9580 2828 D17D BB4E 4DFF B82B

tmac

12:50 p.m.

New subject: Full stride writes/SAS balancing was Re: from performance vs rg size to new topic: rg size uniformity within the aggregate

To answer the raid grouping...Much work has been to automagically assign disks properly.

If you have a brand new systems with lots of disks and create a new large aggregate, you *will* notice that is is striped every which way it can. In other words, If I have two stacks, each RAID GROUP should get evenly (or roughly evenly if there are odd numbers of disks) split between the stacks and on each stack, the disks would get evenly split across the shelves, and at each shelf it will get split per row. I recently did a new ONTAP8 install and had 4 stacks (two of 600GB SAS and two of 2TB SATA). What I described happened like it should. If you have a clean system, you should not worry about balancing as ONTAP will take care of that for you. If I did a raid group of 14 for instance and I have 2 x 4243 shelves in each stack, I should see something like stack1, shelf 1 and stack 2 shelf 1 with 4 disks each stack1, shelf 2 and stack 2 shelf 2 with 3 disks each

On top of all that, ONTAP will talk to half of the disks on each shelf on the A port and the other half on the B port. Everything is load balanced --tmac Tim McCarthy Principal Consultant

RedHat Certified Engineer 804006984323821 (RHEL4) 805007643429572 (RHEL5)

On Fri, Nov 4, 2011 at 8:26 AM, Petar Forai pfo@pseudoterminal.org wrote:

...

Hey Guys!

To piggyback on the RG size vs performance discussions...

Can anyone explain why the RG size as such is not _that_ dominant to aggregate (write) performance? My current understanding of WAFL with respect to writes is, that due to the write anywhere mantra it is (most of the time?) doing what on RAID-6 (in SNIA definitions) should be the equivalent of a full stride write.

If this is the case how come there is no requirement for the RG size ( N + P + Q) to be even? I'm sure I'm missing something here.

In the last discussions of RG size I've never seen anyone mention aggr/RG layout over SAS stacks/shelfs which seems a bitt odd to me. From the sequential read POV it should be crucial to balance back end SAS ports but no mentioning of that on any NetApp TR I've come across. There should be some kind of automatic SAS path load balancing going on (on a per disk basis?) automagically but from my current observations ONTAP doesn't care about disk selection/placement when it comes to aggr construction.

NetApp tells you[1] that optimal RG size shouldn't differ by more than +- 1 within an aggregate - does this also apply to large aggregates composed of say:

6 RGs of size 12 plus another 4 RGs each holding 13 disks?

On a somewhat related note, I've not come across any considerations of different SAS HBA capacities - just as an example:

A FAS3240A system that I'm currently looking at, with the IOXM option installed - each head has one dual-port on-board 6Gbps SAS HBA (ports 0a and 0b) and an additional _quad-port_ 6Gbps SAS HBA in the IOXM.

According to the platform documentation[2], in particular the platform block diagram, the IOXM is only 32xPCI-E 1.0 lanes switch connected to the PCI-E 2.0 (32x?) switch (that is connected to the PCI-E root complex) shared by the onboard SAS HBA and the FC HBA. So you have a quad port 6 Gbps HBA hanging on 8x PCI-1.0 in the IOXM which is shared somewhat with the and one dual port 6Gbps 8x PCI-E 2.0 HBA. How do you balance this - if at all :) ?

I'm asking because a client's system currently has 2x DS4324 populated with 48x 1TB drives and another 2x DS4324 with 48x 1TB drives has been purchased and installed so that each 2xDS4324 are on their own stack. The current configuration is one aggregate with RG size of 15 and 3 hot spares. The system is running 8.0.2P3.

According to [3] the maximum aggr size on pre 8.1 60 disks with rg size 17 make for the optimal layout of a single 1T aggr on a FAS3240A. For 8.1 I'd probably go with RG size 18 and 5 of them leaving 2 spares per controller or so.

As 8.1 is basically just around the corner I'd like to have aggr designed with the 8.1 limits in the back of my head.

I would probably go now with a new aggr from the new shelfs with RG size 17 do a aggr copy destroy the old one and add a few disks from the old one until 8.1 is running on the system.

Any opinions?

Thanks, P

— [1] TR3838 (NDA) or check [3] about most important content of TR3838 [2] FAS/V3200 Platform Report [3] http://communities.netapp.com/message/52632#52632

On 03.11.2011, at 18:50, Adam Levin wrote:

...
On Thu, 3 Nov 2011, Jeff Mohler wrote:

...
And at this level...were in violent agreement. :) :)

Heh, indeed! :)

...
Ive never chastised (lightly or otherwise) a customer with RG's that

dont

...
...
balance out perfectly...we discuss it, but if a disk purchase is not in store to solve a perf issue Im consulting over, I dont suggest that

capital

...
...
and effort be taken to resolve it at that time. It's rare, IMHO, to see that overhead manifest itself in the user space. You have to exhaust

the

...
...
CP engine to do it, then even so, you're likely still left with a sizing challenge to resolve. (Legacy HW servicing new user loads, etc)

I think that's a reasonable position to take.

...
Later on..a few disk adds and a few injections of reallocate to the

patient

...
...
are in good order.

Yup.

-Adam

Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters

Petar Forai

PGP-Key-Fingerprint: 4D15 F20B 6BB0 F68D 9580 2828 D17D BB4E 4DFF B82B

Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters

Jan-Pieter Cornet

8:52 p.m.

New subject: from performance vs rg size to new topic: rg size uniformity within the aggregate

On 2011 Nov 2, at 2:28 , Adam Levin wrote:

...

On Tue, 1 Nov 2011, Cotton, Randall wrote:

...
Unfortunately, I've not been enlightened as to performance numbers for varying small aggregate sizes with one raid group, so I guess I'll have

Yeah, sorry about that, but it appears we just don't have that information available to us (and NetApp reps don't seem to want to answer :) ).

My guess is that it's not very different from most other RAID setups. There's a nice IOPS calculator for random raid setups here: http://www.wmarow.com/strcalc/strcalc.html

raw-disk-IOPS are roughly linear in the number of data drives. So one 10 disk raid-DP array (8 data disks) will be twice as fast as one 6 disk raid-DP array (4 data disks). At least in raw disk IOPS.

Of course, if your working set is small enough that most if it fits in cache, and you have a nice cache hit rate of, say, 90% (not unusual), the number of read IOPS is roughly multiplied by 10.

And... because of WAFL, netapp writes are generally extremely efficient, compared to random RAID writes. At least if your aggregates aren't too full.

You will have to google for your drive characteristics: mainly the drive seek latency.

And then, the netapp head will likely add a nice IO queue to the mix, getting even more IOPS out of your array.

And of course, your read/write ratio will make a difference too... although for netapps, certainly not as big as the wmarow.com site wants you to believe (in fact, because of the WAFL layout and NVRAM, writes might even be faster than reads. That of course assumes that your NVRAM is big enough).

And since I'm not a netapp engineer, there are probably other factors that influence the number of IOPS that a netapp produces, that I'm not aware of. Netapp does publish benchmarks for most of their storage systems, but these are usually based on benchmarks with LOTS of drives, so the netapp head itself is pushing its limits. With just a few dozen disks as the OP has, the netapp head won't be the limiting factor for any configuration.

So to summarise, I'm not surprised netapp doesn't give performance numbers for the situations like you described, since there are simply too many factors in play for a realistic comparison.

-- Jan-Pieter Cornet SSL is only keeping your connection safe from hackers, crooks and three letter agencies by the least secured, least likely to refuse money from strangers, and least bullying-proof of several hundred companies worldwide.

9:23 p.m.

New subject: from performance vs rg size to new topic: rg size uniformity within the aggregate

...

(in fact, because of the WAFL layout and NVRAM, writes might even be faster than reads. That of course assumes that your NVRAM is big enough)

___ Ya, more on that later if another reader doesn't do it sooner. I'm thumb typing right now.

NVRAM size has little to do with NetApp write performance. It is always big enough. More won't go faster. Less wont go slower.

NetApp writes to client are generally always fractions of a millisecond. Writes to disk user land never sees or feels directly.

...

Netapp does publish benchmarks for most of their storage systems, but these are usually based on benchmarks with LOTS of drives, so the netapp head itself is pushing its limits. With just a few dozen disks as the OP has, the netapp head won't be the limiting factor for any configuration.

___ You can never know that. No workload context exists here. It could easily be the bottleneck.

5014

Age (days ago)

5016

Last active (days ago)

toasters@lists.teaparty.net

13 comments

9 participants

tags (0)

participants (9)

Adam Levin
Cotton, Randall
Davin Milun
Jan-Pieter Cornet
Jeff Mohler
JM
Petar Forai
steve klise
tmac