We're a relatively small shop just getting started with NetApp (1 2020 and 2 2040's all active/active, plus 3 4342's for the 2040's ).
Since committing a disk to a raid group via an aggregate is essentially a permanent thing (you can't change your mind and later shrink an aggregate to pull out a disk from a raid group to use with another node), we'd prefer not to put all our (80) disks in aggregates just yet. We might want more in some nodes and less in other nodes as future needs come into clearer focus. In addition, it's clear that we can later easily add in any disks we've held back as needed and use the reallocate command with the -f option to re-optimize the layout to accommodate the added disks efficiently. So it seems a bit short-sighted to configure all 80 of our disks into aggregates among our 6 nodes right from the get-go (minus the requisite hot spares, of course), as our vendor would have us do.
But in deciding how small to start out with, we don't want to cripple our performance too much. I've looked long and hard on the net to find some data, any data, on DOT 7.x performance vs raid group size, but have come up empty. I understand that performance should be awful and unacceptable if you have a raid group of size 3. I also understand from anecdotal evidence that performance improvements from higher raid group sizes are apparently are not significant once you get to a raid group size of 16 or so.
But what about in between? How does a graph of performance vs raid size look from 3 to, say, 20? Just ballpark data on any type of remotely typical workload would help a lot to start with. Has anyone ever seen or tried to compile this kind of data? Using iometer, perhaps, or any other benchmarking tool? RAID-DP data is preferable, but I'd take RAID-4 data if that's all I can get.
Don't really have the time to do testing myself.
Thanks
Randall Cotton
University of Illinois Foundation
Randall,
have you seen TR-3838, the Storage Subsystem Configuration Guide? It doesn't have a definitive statement, but it does have a section that starts with the following caveats and describes some reasons for creating larger RAID groups than the prior default:
"4.6 RAID GROUP SIZING
The previous approach to RAID group and aggregate sizing was to use the default RAID group size. This no longer applies, because the breadth of storage configurations being addressed by NetApp products is more comprehensive than it was when the original sizing approach was determined." Cheers, Eugene
On Sun, Oct 30, 2011 at 4:23 PM, Cotton, Randall <recotton@uif.uillinois.edu
wrote:
We’re a relatively small shop just getting started with NetApp (1 2020 and 2 2040’s all active/active, plus 3 4342’s for the 2040’s ).****
Since committing a disk to a raid group via an aggregate is essentially a permanent thing (you can’t change your mind and later shrink an aggregate to pull out a disk from a raid group to use with another node), we’d prefer not to put all our (80) disks in aggregates just yet. We might want more in some nodes and less in other nodes as future needs come into clearer focus. In addition, it’s clear that we can later easily add in any disks we’ve held back as needed and use the reallocate command with the –f option to re-optimize the layout to accommodate the added disks efficiently. So it seems a bit short-sighted to configure all 80 of our disks into aggregates among our 6 nodes right from the get-go (minus the requisite hot spares, of course), as our vendor would have us do.****
But in deciding how small to start out with, we don’t want to cripple our performance too much. I’ve looked long and hard on the net to find some data, any data, on DOT 7.x performance vs raid group size, but have come up empty. I understand that performance should be awful and unacceptable if you have a raid group of size 3. I also understand from anecdotal evidence that performance improvements from higher raid group sizes are apparently are not significant once you get to a raid group size of 16 or so.****
But what about in between? How does a graph of performance vs raid size look from 3 to, say, 20? Just ballpark data on any type of remotely typical workload would help a lot to start with. Has anyone ever seen or tried to compile this kind of data? Using iometer, perhaps, or any other benchmarking tool? RAID-DP data is preferable, but I’d take RAID-4 data if that’s all I can get.****
Don’t really have the time to do testing myself.****
Thanks****
Randall Cotton****
University of Illinois Foundation****
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Its going to be more the total # of data drives across the DATA stripe of the aggregate at work, no so much the raid groups operating as independent IO buckets.
Each RG is an independent parity protected zone across the entire data stripe of the aggregate.
You dont want a TON of raid groups, because you have to calculate parity for all of them, which is why you have a range of options between defaults, and max allowed...depending on your tolerance for recovery times in your business planning. More RG's, less overhead for parity (which really isnt overhead, your business plan tells you that COST to data protection).
As long as all of your raid groups in their current configuration are of the same time horizon (made at the same time, or reallocated since adding drives to any RG or adding new RG's) then the stripe wants to operate as a whole across the ENTIRE aggregate.
This can be seen/measured via statit as uniformity of IO across the data drives in the observed raid group.
On Sun, Oct 30, 2011 at 4:27 PM, Eugene Vilensky evilensky@gmail.comwrote:
Randall,
have you seen TR-3838, the Storage Subsystem Configuration Guide? It doesn't have a definitive statement, but it does have a section that starts with the following caveats and describes some reasons for creating larger RAID groups than the prior default:
"4.6 RAID GROUP SIZING
The previous approach to RAID group and aggregate sizing was to use the default RAID group size. This no longer applies, because the breadth of storage configurations being addressed by NetApp products is more comprehensive than it was when the original sizing approach was determined." Cheers, Eugene
On Sun, Oct 30, 2011 at 4:23 PM, Cotton, Randall < recotton@uif.uillinois.edu> wrote:
We’re a relatively small shop just getting started with NetApp (1 2020 and 2 2040’s all active/active, plus 3 4342’s for the 2040’s ).****
Since committing a disk to a raid group via an aggregate is essentially a permanent thing (you can’t change your mind and later shrink an aggregate to pull out a disk from a raid group to use with another node), we’d prefer not to put all our (80) disks in aggregates just yet. We might want more in some nodes and less in other nodes as future needs come into clearer focus. In addition, it’s clear that we can later easily add in any disks we’ve held back as needed and use the reallocate command with the –f option to re-optimize the layout to accommodate the added disks efficiently. So it seems a bit short-sighted to configure all 80 of our disks into aggregates among our 6 nodes right from the get-go (minus the requisite hot spares, of course), as our vendor would have us do.****
But in deciding how small to start out with, we don’t want to cripple our performance too much. I’ve looked long and hard on the net to find some data, any data, on DOT 7.x performance vs raid group size, but have come up empty. I understand that performance should be awful and unacceptable if you have a raid group of size 3. I also understand from anecdotal evidence that performance improvements from higher raid group sizes are apparently are not significant once you get to a raid group size of 16 or so.****
But what about in between? How does a graph of performance vs raid size look from 3 to, say, 20? Just ballpark data on any type of remotely typical workload would help a lot to start with. Has anyone ever seen or tried to compile this kind of data? Using iometer, perhaps, or any other benchmarking tool? RAID-DP data is preferable, but I’d take RAID-4 data if that’s all I can get.****
Don’t really have the time to do testing myself.****
Thanks****
Randall Cotton****
University of Illinois Foundation****
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
From: Jeff Mohler [mailto:speedtoys.racing@gmail.com] Sent: Sunday, October 30, 2011 6:54 PM To: Eugene Vilensky Cc: Cotton, Randall; toasters@teaparty.net Subject: Re: Is there a graph somewhere of performance vs raid group size?
Its going to be more the total # of data drives across the DATA stripe
of the aggregate at work, no so much the raid groups operating as independent IO buckets.
Yeah, you're right. But for what it's worth, to start off with, I'll only have one raid group per aggregate. We just don't have that many disks yet.
But even with multiple rg's per aggregate, If you start at a just-initialized blank slate aggregate, for instance, and provide a specific workload, an aggregate with n rg's of size 12 is going to perform better than an aggregate with n rg's of size 8. Question is: how much better? I would assume that, the difference in performance would be in the same ballpark no matter what protocol you're using as long as there are no bottlenecks. I could be wrong on that, though.
This can be seen/measured via statit as uniformity of IO across the
data drives in the observed raid group.
Thanks for the tip on (undocumented) statit.
Randall
On Sun, Oct 30, 2011 at 4:27 PM, Eugene Vilensky evilensky@gmail.com wrote:
Randall,
have you seen TR-3838, the Storage Subsystem Configuration Guide? It doesn't have a definitive statement, but it does have a section that starts with the following caveats and describes some reasons for creating larger RAID groups than the prior default:
"4.6 RAID GROUP SIZING
The previous approach to RAID group and aggregate sizing was to use the default RAID group size. This no longer applies, because the breadth of storage configurations being addressed by NetApp products is more comprehensive than it was when the original sizing approach was determined."
Cheers,
Eugene
On Sun, Oct 30, 2011 at 4:23 PM, Cotton, Randall recotton@uif.uillinois.edu wrote:
We're a relatively small shop just getting started with NetApp (1 2020 and 2 2040's all active/active, plus 3 4342's for the 2040's ).
Since committing a disk to a raid group via an aggregate is essentially a permanent thing (you can't change your mind and later shrink an aggregate to pull out a disk from a raid group to use with another node), we'd prefer not to put all our (80) disks in aggregates just yet. We might want more in some nodes and less in other nodes as future needs come into clearer focus. In addition, it's clear that we can later easily add in any disks we've held back as needed and use the reallocate command with the -f option to re-optimize the layout to accommodate the added disks efficiently. So it seems a bit short-sighted to configure all 80 of our disks into aggregates among our 6 nodes right from the get-go (minus the requisite hot spares, of course), as our vendor would have us do.
But in deciding how small to start out with, we don't want to cripple our performance too much. I've looked long and hard on the net to find some data, any data, on DOT 7.x performance vs raid group size, but have come up empty. I understand that performance should be awful and unacceptable if you have a raid group of size 3. I also understand from anecdotal evidence that performance improvements from higher raid group sizes are apparently are not significant once you get to a raid group size of 16 or so.
But what about in between? How does a graph of performance vs raid size look from 3 to, say, 20? Just ballpark data on any type of remotely typical workload would help a lot to start with. Has anyone ever seen or tried to compile this kind of data? Using iometer, perhaps, or any other benchmarking tool? RAID-DP data is preferable, but I'd take RAID-4 data if that's all I can get.
Don't really have the time to do testing myself.
Thanks
Randall Cotton
University of Illinois Foundation
_______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
_______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Surely there is no sensible answer to this question, which is why netapp refuse to quantify. We know that performance will increase with increased spindle count, but that assumes certain patterns of I/O and also more importantly, that you are near the limits of performance.
Personally, I would say that unless you think there is going to be a performance problem don't worry too much about it, and if there is a performance problem, its a lot better to avoid the I/O entirely with bigger caches than it is to add more spindles into aggregates.
Regards, pdg
It's because of two things.
1). Rg size is not directly tied to performance itself. The total number of data drives per _aggregate_ is. A system with four 4+2 rg's in an aggregate and a system with a single 16+2 aggregate are within a verrrry narrow margin of each other performance wise. The only difference being slightly more overhead for every additional raid stripe to manage In the 4(4+2) example. Raid groups do not drive io. They provide resiliency.
2). What value do you want? (not directed to Peter, just in general) I could lay it out in zebras per railroad car..but that's not your workload, is it?
Top Down: 1] Aggregates provide physical performance. -Size them for either throughput in MB/sec, or Physical IO's/sec. One is a want, the other is a need. Your business planning determines that relationship. 2] Volumes provide user space to do work in. The workload within them is generally limited by the capability of the aggregate layer beneath them. 3] Raid groups provide firewalls of data protection in the multi-family unit called the aggregate. They don't inhibit the entry or exit of workload into it as long as the block everyone lives is in constructed and managed responsibly as you add units to it. (reallocate, etc..because sometimes the tenant workload leaves a mess behind) 4] Your unique dataset with YOUR unique workload applied to it will result in...a unique result in every metric you could possibly measure, and want NetApp to provide a blanket answer to.
..see the problem with "we can't really say..."? I mean other that more = mo-better.
Sent from my iThingie
On Oct 30, 2011, at 18:30, "Peter D. Gray" pdg@uow.edu.au wrote:
Surely there is no sensible answer to this question, which is why netapp refuse to quantify.
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Just to clarify: for me, raid group size = aggregate size - that is, I'll only have one raid group per aggregate.
It would have been clearer if I asked for benchmarks of performance vs aggregate data disk count. My apologies.
Also, to clarify, I'm not asking NetApp (or anyone else) for a magic bullet single graph that will allow me to take one look and say "Oh, I need <blah> disks in my aggregate". I'm just looking for any kind of benchmarks of any kind of performance comparing different aggregate sizes so I can start to get a rough feel for performance (nebulously speaking) vs disk count. Anything (the more the merrier) that I can find which tests some kind of performance measure while varying only aggregate data disk count in the single and/or low double-digits would be helpful.
This doesn't seem to me an unreasonable thing to expect might be available somewhere. And it seems to me it's my professional responsibility to at least try to seek it out before recommending how we should (re)configure our boxes rather than being content to take a stab in the dark 8-)
Randall
-----Original Message----- From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Jeff Mohler Sent: Sunday, October 30, 2011 10:38 PM To: Peter D. Gray Cc: toasters@teaparty.net Subject: Re: Is there a graph somewhere of performance vs raid group size?
It's because of two things.
1). Rg size is not directly tied to performance itself. The total number of data drives per _aggregate_ is. A system with four 4+2 rg's in an aggregate and a system with a single 16+2 aggregate are within a verrrry narrow margin of each other performance wise. The only difference being slightly more overhead for every additional raid stripe to manage In the 4(4+2) example. Raid groups do not drive io. They provide resiliency.
2). What value do you want? (not directed to Peter, just in general) I could lay it out in zebras per railroad car..but that's not your workload, is it?
Top Down: 1] Aggregates provide physical performance. -Size them for either throughput in MB/sec, or Physical IO's/sec. One is a want, the other is a need. Your business planning determines that relationship. 2] Volumes provide user space to do work in. The workload within them is generally limited by the capability of the aggregate layer beneath them. 3] Raid groups provide firewalls of data protection in the multi-family unit called the aggregate. They don't inhibit the entry or exit of workload into it as long as the block everyone lives is in constructed and managed responsibly as you add units to it. (reallocate, etc..because sometimes the tenant workload leaves a mess behind) 4] Your unique dataset with YOUR unique workload applied to it will result in...a unique result in every metric you could possibly measure, and want NetApp to provide a blanket answer to.
..see the problem with "we can't really say..."? I mean other that more = mo-better.
Sent from my iThingie
On Oct 30, 2011, at 18:30, "Peter D. Gray" pdg@uow.edu.au wrote:
Surely there is no sensible answer to this question, which is why netapp refuse to quantify.
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
_______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
On Sun, Oct 30, 2011 at 10:26 PM, Cotton, Randall < recotton@uif.uillinois.edu> wrote:
Just to clarify: for me, raid group size = aggregate size - that is, I'll only have one raid group per aggregate.
It would have been clearer if I asked for benchmarks of performance vs aggregate data disk count. My apologies.
--- The answer would be the same.
More = better. Just..remove "raid groups" from your question regarding performance. Its # of disks.
This doesn't seem to me an unreasonable thing to expect might be
available somewhere.
--- Well, it is.
More = better.
A system..any system, will have capabilities defined by limitations in multiple areas.
You'll run out of processing power (controller) or storage (disk) depending on how you prioritize what you need to do, with what you want to budget for.
And it seems to me it's my professional responsibility to at least try to seek it out before recommending how we should (re)configure our boxes rather than being content to take a stab in the dark 8-)
--- Any reason to leave performance on the table by not configuring all of what you bought in a reasonable manner?
A single RG of 22 disks, or two of 11, or...I dunno.
You're not exposing your goals. Dunno what you bought, dunno what you NEED to do with it, dunno what you WANT to do with it.
Randall
-----Original Message----- From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Jeff Mohler Sent: Sunday, October 30, 2011 10:38 PM To: Peter D. Gray Cc: toasters@teaparty.net Subject: Re: Is there a graph somewhere of performance vs raid group size?
It's because of two things.
1). Rg size is not directly tied to performance itself. The total number of data drives per _aggregate_ is. A system with four 4+2 rg's in an aggregate and a system with a single 16+2 aggregate are within a verrrry narrow margin of each other performance wise. The only difference being slightly more overhead for every additional raid stripe to manage In the 4(4+2) example. Raid groups do not drive io. They provide resiliency.
2). What value do you want? (not directed to Peter, just in general) I could lay it out in zebras per railroad car..but that's not your workload, is it?
Top Down: 1] Aggregates provide physical performance. -Size them for either throughput in MB/sec, or Physical IO's/sec. One is a want, the other is a need. Your business planning determines that relationship. 2] Volumes provide user space to do work in. The workload within them is generally limited by the capability of the aggregate layer beneath them. 3] Raid groups provide firewalls of data protection in the multi-family unit called the aggregate. They don't inhibit the entry or exit of workload into it as long as the block everyone lives is in constructed and managed responsibly as you add units to it. (reallocate, etc..because sometimes the tenant workload leaves a mess behind) 4] Your unique dataset with YOUR unique workload applied to it will result in...a unique result in every metric you could possibly measure, and want NetApp to provide a blanket answer to.
..see the problem with "we can't really say..."? I mean other that more = mo-better.
Sent from my iThingie
On Oct 30, 2011, at 18:30, "Peter D. Gray" pdg@uow.edu.au wrote:
Surely there is no sensible answer to this question, which is why netapp refuse to quantify.
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
From: Jeff Mohler [mailto:speedtoys.racing@gmail.com] Sent: Monday, October 31, 2011 12:42 AM To: Cotton, Randall Cc: toasters@teaparty.net Subject: Re: Is there a graph somewhere of performance vs raid group size?
Any reason to leave performance on the table by not configuring all of what you bought in a reasonable manner?
Simple: Committing all disks to aggregates when you don't need to
squeeze out every last bit of performance locks disks to specific nodes unnecessarily. Then, when you need more disks on a particular node, rather than just moving over disks from an underutilized node, you have to buy new disks or even a new shelf.
R
I think it's quite reasonable to only allocate resources when needed. IT shops don't always have perfect control over their purchasing budget's granularity, most of us are asked to forecast at least a year in advance what our requirements will be.
That being said it's a good idea to set up your RG's and aggrs in a way that makes it easy to cleanly adding capacity. Having one 4 disk RG and two others with 16 spindles is probably not a good idea.
From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Cotton, Randall Sent: Monday, October 31, 2011 10:23 AM To: toasters@teaparty.net Subject: RE: Is there a graph somewhere of performance vs raid group size?
From: Jeff Mohler [mailto:speedtoys.racing@gmail.com] Sent: Monday, October 31, 2011 12:42 AM To: Cotton, Randall Cc: toasters@teaparty.net Subject: Re: Is there a graph somewhere of performance vs raid group size?
Any reason to leave performance on the table by not configuring all of what you bought in a reasonable manner?
Simple: Committing all disks to aggregates when you don't need to
squeeze out every last bit of performance locks disks to specific nodes unnecessarily. Then, when you need more disks on a particular node, rather than just moving over disks from an underutilized node, you have to buy new disks or even a new shelf.
R
Please be advised that this email may contain confidential information. If you are not the intended recipient, please notify us by email by replying to the sender and delete this message. The sender disclaims that the content of this email constitutes an offer to enter into, or the acceptance of, any agreement; provided that the foregoing does not invalidate the binding effect of any digital or other electronic reproduction of a manual signature that is included in any attachment.
On Mon, Oct 31, 2011 at 7:38 AM, Page, Jeremy jeremy.page@gilbarco.comwrote:
I think it’s quite reasonable to only allocate resources when needed. IT shops don’t always have perfect control over their purchasing budget’s granularity, most of us are asked to forecast at least a year in advance what our requirements will be. ****
That being said it’s a good idea to set up your RG’s and aggrs in a way that makes it easy to cleanly adding capacity. Having one 4 disk RG and two others with 16 spindles is probably not a good idea.
--- Aggregates do not provide capacity.
Volumes do.
You MAY have business drivers to have separate pools of spindle, but..spindles not together, arent parallelizing with each other with a common workload.
But there is no reason one 4+2 and two 16+2s cant perform quite well. There is no unreasonable penalty to doing so if that's how it happened to grow over time if the right tools are used to manage it.
Any reason to leave performance on the table by not configuring all of what you bought in a reasonable manner?
Simple: Committing all disks to aggregates when you don’t need to
squeeze out every last bit of performance locks disks to specific nodes unnecessarily. Then, when you need more disks on a particular node, rather than just moving over disks from an underutilized node, you have to buy new disks or even a new shelf.
--- What node?
Whats a node?
Whats your goal?
From: Jeff Mohler [mailto:speedtoys.racing@gmail.com] Sent: Monday, October 31, 2011 9:54 AM To: Cotton, Randall Cc: toasters@teaparty.net Subject: Re: Is there a graph somewhere of performance vs raid group size?
Any reason to leave performance on the table by not configuring all of what you bought in a reasonable manner? > Simple: Committing all disks to aggregates when you don't need to squeeze out every last bit of performance locks disks to specific nodes unnecessarily. Then, when you need more disks on a particular node, rather than just moving over disks from an underutilized node, you have to buy new disks or even a new shelf.
---
Whats a node?
The term "node" is NetApp lingo for a controller, especially in an active/active configuration. I use "node" to indicate that all my controllers are in active/active pairs.
Whats your goal?
To configure the aggregates on my six node with something less than every last disk I have (80) so that extras are available that I can add to nodes that need it in the (somewhat unpredictable) future. This way I'm not forced to buy new disks or shelves when I have perhaps severely underutilized disks on one node that I can't reassign because they're locked in an aggregate. All the while, I only wish to do this to the extent that performance is not impacted too greatly - say 10 or 15%.
Well, I know what a controller is, and a node...when in context.
You havent described your goals, what you have, what you DONT have...
What we need to agree on, is that RG's are not performance level objects.
What we also need to agree on, is that we need to know what you're trying to do, to answer what you might wanna do with what you have.
Whats your goal?
To configure the aggregates on my six node with something less than every last disk I have (80) so that extras are available that I can add to nodes that need it in the (somewhat unpredictable) future. This way I’m not forced to buy new disks or shelves when I have perhaps severely underutilized disks on one node that I can’t reassign because they’re locked in an aggregate. All the while, I only wish to do this to the extent that performance is not impacted too greatly – say 10 or 15%.
Six node Cmode cluster? Three HA pairs? Six single controller standalone systems?
"This way I’m not forced to buy new disks or shelves when I have perhaps severely underutilized disks on one node"
Well this part is simple.
What's your business plan say you need to do...for what applications, for what workloads, for how long.
==================== (Any Netapp Partners here going to Insight? Stop & say hi in TR-2-140 and TR-2-141..they made me present this year)
On Mon, 31 Oct 2011, Cotton, Randall wrote:
Just to clarify: for me, raid group size = aggregate size - that is, I'll only have one raid group per aggregate.
I'm curious about this statement. Any reason why? What exactly is it that you are trying to accomplish?
I may not have answers to rg size vs. performance, but I can tell you that if you limit the sizes of your aggregates, performance *will* suffer in almost every case.
speaking) vs disk count. Anything (the more the merrier) that I can find which tests some kind of performance measure while varying only aggregate data disk count in the single and/or low double-digits would be helpful.
While you'll start seeing benefits of striping almost immediately, you won't get real gains for a full data center until you have many, many disks sharing the I/O load.
This doesn't seem to me an unreasonable thing to expect might be available somewhere. And it seems to me it's my professional responsibility to at least try to seek it out before recommending how we should (re)configure our boxes rather than being content to take a stab in the dark 8-)
The thing is, it's in the vendor's best interests to help you configure the units in a way that they perform well and meet your needs. Granted the vendor also wants to sell more hardware and more disks, but it's worth working with them and getting white papers about it.
The usual suggestions from NetApp were to use 16 disk RAID-DP raid groups and make the aggregates as big as possible -- that spreads the load, increases IOPS, and allows the array to move data around to avoid hot spots as it sees fit. Why make more work for yourself?
-Adam
-----Original Message----- From: Adam Levin [mailto:levins@westnet.com] Sent: Monday, October 31, 2011 6:52 AM To: Cotton, Randall Cc: toasters@teaparty.net Subject: RE: Is there a graph somewhere of performance vs raid group size?
On Mon, 31 Oct 2011, Cotton, Randall wrote:
Just to clarify: for me, raid group size = aggregate size - that is,
I'll only have one raid group per aggregate.
I'm curious about this statement. Any reason why? What exactly is it
that you are trying to accomplish?
Well, I simply don't have enough disks yet to fill up a single raid group per node. 80 disks, 6 nodes.
I may not have answers to rg size vs. performance, but I can tell you
that if you limit the sizes of your aggregates, performance *will* suffer in almost every case.
Sure, no doubt, but if I can save, say, 3 or 4 disks that I can use in any of my 6 nodes some time in the future by making my aggregate 12 or 13 disks instead of 16, and I only lose, say, 10% performance potential by doing so, it will make sense for us.
While you'll start seeing benefits of striping almost immediately, you
won't get real gains for a full data center until you have many, many disks sharing the I/O load.
Right, though I understand from conventional wisdom that the gains diminish to tiny increments past about 16 data disks per aggregate (and your exposure to a failed disk and long reconstruction times becomes big enough that going past 16 may not be worth it).
The thing is, it's in the vendor's best interests to help you
configure the units in a way that they perform well and meet your needs. Granted the vendor also wants to sell more hardware and more disks, but it's worth working with them and getting white papers about it.
Well said. I'll go down that road, certainly.
Thanks, Randall
On Mon, 31 Oct 2011, Cotton, Randall wrote:
Well, I simply don't have enough disks yet to fill up a single raid group per node. 80 disks, 6 nodes.
Is there a reason you bought so many filers and so few disks? It seems to me that management would have been much simpler (and this whole situation could have been avoided, in fact) with one filer. 80 spindles is nothing to a single filer head, let alone a clustered filer.
Right, though I understand from conventional wisdom that the gains diminish to tiny increments past about 16 data disks per aggregate (and your exposure to a failed disk and long reconstruction times becomes big enough that going past 16 may not be worth it).
You're conflating aggregates and raid groups again.
The performance within a single raid group isn't much different after 16 disks. The rebuild time for a single disk is the same regardless of raid group size. The chances of a double-disk failure within a single raid group go up a bit with larger raid groups.
However, in an *aggregate*, the more spindles you have (that is, the more raid groups, since you want to add full raid groups when you can), the better the performance, and it does keep going up because the I/O spreads out more and more as you go to very wide stripes.
-Adam
-----Original Message----- From: Adam Levin [mailto:levins@westnet.com] Sent: Monday, October 31, 2011 10:29 AM To: Cotton, Randall Cc: toasters@teaparty.net Subject: RE: Is there a graph somewhere of performance vs raid group size?
On Mon, 31 Oct 2011, Cotton, Randall wrote:
Well, I simply don't have enough disks yet to fill up a single raid group per node. 80 disks, 6 nodes.
Is there a reason you bought so many filers and so few disks? It
seems to me that management would have been much simpler (and this whole situation could have been avoided, in fact) with one filer. 80 spindles is nothing to a single filer head, let alone a clustered filer.
No argument there. There are 3 separate geographical sites, though, and HA was desired everywhere. For what it's worth, I was brought on after the purchase.
Right, though I understand from conventional wisdom that the gains diminish to tiny increments past about 16 data disks per aggregate (and your exposure to a failed disk and long reconstruction times becomes big enough that going past 16 may not be worth it).
You're conflating aggregates and raid groups again.
Ooh, yes, quite correct. 16 disks per raid group, not aggregate, is what I meant. Thanks for correcting that.
However, in an *aggregate*, the more spindles you have (that is, the
more raid groups, since you want to add full raid groups when you can), the better the performance, and it does keep going up because the I/O spreads out more and more as you go to very wide stripes.
This is an important distinction and though I'm not able to afford the luxury yet of multiple aggregates per node, it's an excellent point to make and keep in mind.
Thanks Randall
I suspect you have not 80 but 84 disks in 6 DS14 shelves. 80 is a bit odd number for NetApp, and is quite hard to come at during new purchase. :) In this case you will *not* be able to redistribute disks anyway - you have 2 full shelves per site and to add more you will need new shelf. Or are you planning on putting half of shelves in stock for future?
--- With best regards
Andrey Borzenkov Senior system engineer Service operations
-----Original Message----- From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Cotton, Randall Sent: Monday, October 31, 2011 7:34 PM To: toasters@teaparty.net Subject: RE: Is there a graph somewhere of performance vs raid group size?
-----Original Message----- From: Adam Levin [mailto:levins@westnet.com] Sent: Monday, October 31, 2011 10:29 AM To: Cotton, Randall Cc: toasters@teaparty.net Subject: RE: Is there a graph somewhere of performance vs raid group size?
On Mon, 31 Oct 2011, Cotton, Randall wrote:
Well, I simply don't have enough disks yet to fill up a single raid group per node. 80 disks, 6 nodes.
Is there a reason you bought so many filers and so few disks? It
seems to me that management would have been much simpler (and this whole situation could have been avoided, in fact) with one filer. 80 spindles is nothing to a single filer head, let alone a clustered filer.
No argument there. There are 3 separate geographical sites, though, and HA was desired everywhere. For what it's worth, I was brought on after the purchase.
Right, though I understand from conventional wisdom that the gains diminish to tiny increments past about 16 data disks per aggregate (and your exposure to a failed disk and long reconstruction times becomes big enough that going past 16 may not be worth it).
You're conflating aggregates and raid groups again.
Ooh, yes, quite correct. 16 disks per raid group, not aggregate, is what I meant. Thanks for correcting that.
However, in an *aggregate*, the more spindles you have (that is, the
more raid groups, since you want to add full raid groups when you can), the better the performance, and it does keep going up because the I/O spreads out more and more as you go to very wide stripes.
This is an important distinction and though I'm not able to afford the luxury yet of multiple aggregates per node, it's an excellent point to make and keep in mind.
Thanks Randall
_______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
On Mon, 31 Oct 2011, Cotton, Randall wrote:
No argument there. There are 3 separate geographical sites, though, and HA was desired everywhere. For what it's worth, I was brought on after the purchase.
Heh, ain't that always the way? It's trouble when you start inheriting someone else's problems. :)
Ooh, yes, quite correct. 16 disks per raid group, not aggregate, is what I meant. Thanks for correcting that.
No problem.
However, in an *aggregate*, the more spindles you have (that is, the more raid groups, since you want to add full raid groups when you can), the better the performance, and it does keep going up because the I/O spreads out more and more as you go to very wide stripes.
This is an important distinction and though I'm not able to afford the luxury yet of multiple aggregates per node, it's an excellent point to make and keep in mind.
Well, so given what you have, 6x(14+2)=96. It seems odd to not have purchased at least that many drives -- you can't even have one full 16 disk rg per filer head. Your best bet is probably just to take the drives you have, split them evenly among the heads, and create 16 disk raid groups from them (when you add drives later, fill in the rest of the data drives for the current rg first). Of course, again this is without knowing the various use cases for each of the locations.
The main thing is that rather than thinking "one raid group = one aggregate", you should be thinking "many raid groups make up a single aggregate". I would stick to 16 disk groups -- you can create partial groups for now, but as you grow try to make the purchases in rg sizes. Keep growing that aggregate over time when you make additional disk purchases until you've reached the maximum aggregate size, and then start another one. There are very few cases where you want a separate aggregate (syncmirror used to care -- not sure it does anymore).
I hope for your sake they are at least FC drives and not SATA.
-Adam
Performance of an aggregate is the total sum of spindles across the entire aggregate.
There is no 'raid group size' in the above sentence.
If you have TEN 2+1 raid groups, think about it this way..you have a 20 drive RAID0 stripe of IOPS available to you. Not ten separate buckets of two.
Forget parity as a limitation to system performance, you dont have a system large enough to worry about it.*
*: If you have a monstrous system, ya, you can remotely measure some overhead with 64b aggregates if you have 18 raid groups -vs- 7 raid groups with the difference being raid group size. The issue isnt the 64b aggregate structure, its just the # of raid zones to manage on writes. No surprise there...just drawing out an obscene example. :) But either way, you still have the same # of drives at work in the total stripe.
On Mon, Oct 31, 2011 at 8:00 AM, Cotton, Randall <recotton@uif.uillinois.edu
wrote:
-----Original Message----- From: Adam Levin [mailto:levins@westnet.com] Sent: Monday, October 31, 2011 6:52 AM To: Cotton, Randall Cc: toasters@teaparty.net Subject: RE: Is there a graph somewhere of performance vs raid group size?
On Mon, 31 Oct 2011, Cotton, Randall wrote:
Just to clarify: for me, raid group size = aggregate size - that is,
I'll only have one raid group per aggregate.
I'm curious about this statement. Any reason why? What exactly is it
that you are trying to accomplish?
Well, I simply don't have enough disks yet to fill up a single raid group per node. 80 disks, 6 nodes.
I may not have answers to rg size vs. performance, but I can tell you
that if you limit the sizes of your aggregates, performance *will* suffer in almost every case.
Sure, no doubt, but if I can save, say, 3 or 4 disks that I can use in any of my 6 nodes some time in the future by making my aggregate 12 or 13 disks instead of 16, and I only lose, say, 10% performance potential by doing so, it will make sense for us.
While you'll start seeing benefits of striping almost immediately, you
won't get real gains for a full data center until you have many, many disks sharing the I/O load.
Right, though I understand from conventional wisdom that the gains diminish to tiny increments past about 16 data disks per aggregate (and your exposure to a failed disk and long reconstruction times becomes big enough that going past 16 may not be worth it).
The thing is, it's in the vendor's best interests to help you
configure the units in a way that they perform well and meet your needs. Granted the vendor also wants to sell more hardware and more disks, but it's worth working with them and getting white papers about it.
Well said. I'll go down that road, certainly.
Thanks, Randall
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Well, from what I gather in searching for TR-3838, it's not publicly available, so no, I haven't even heard of it until now 8-)
I'd love to see it, though.
AFAIK, the closest the main documentation gets to this subject is in the Storage Management Guide (p. 108 in the 7.3 docs) where it says:
Large RAID group configurations offer the following advantages:
* More data drives available. An aggregate configured into a few large RAID groups requires fewer
drives reserved for parity than that same aggregate configured into many small RAID groups.
* Small improvement in storage system performance. Write operations are generally faster with
larger RAID groups than with smaller RAID groups.
Small RAID group configurations offer the following advantages:
* Shorter disk reconstruction times. In case of disk failure within a small RAID group, data
reconstruction time is usually shorter than it would be within a large RAID group.
* Decreased risk of data loss due to multiple disk failures. The probability of data loss through
double-disk failure within a RAID4 group or through triple-disk failure within a RAID-DP group
is lower within a small RAID group than within a large RAID group.
But there's not even a hint of performance data.
Poking around now a little more, I see that a portion of TR-3838 that includes the text you quoted ("The previous approach to RAID group and aggregate sizing...") does appear in a long NetApp Community post on this topic (at http://communities.netapp.com/thread/1587 http://communities.netapp.com/thread/1587 ) near the end. But I'm not sure what was posted includes all the pertinent info from TR-3838. I did recently read that whole thread by the way and, as you noted, it doesn't have a definitive statement or definitive performance data.
Thanks for the tip, though. I'm another step closer now 8-)
R
From: Eugene Vilensky [mailto:evilensky@gmail.com] Sent: Sunday, October 30, 2011 6:27 PM To: Cotton, Randall Cc: toasters@teaparty.net Subject: Re: Is there a graph somewhere of performance vs raid group size?
Randall,
have you seen TR-3838, the Storage Subsystem Configuration Guide? It doesn't have a definitive statement, but it does have a section that starts with the following caveats and describes some reasons for creating larger RAID groups than the prior default:
"4.6 RAID GROUP SIZING
The previous approach to RAID group and aggregate sizing was to use the default RAID group size. This no longer applies, because the breadth of storage configurations being addressed by NetApp products is more comprehensive than it was when the original sizing approach was determined."
Cheers,
Eugene
On Sun, Oct 30, 2011 at 4:23 PM, Cotton, Randall recotton@uif.uillinois.edu wrote:
We're a relatively small shop just getting started with NetApp (1 2020 and 2 2040's all active/active, plus 3 4342's for the 2040's ).
Since committing a disk to a raid group via an aggregate is essentially a permanent thing (you can't change your mind and later shrink an aggregate to pull out a disk from a raid group to use with another node), we'd prefer not to put all our (80) disks in aggregates just yet. We might want more in some nodes and less in other nodes as future needs come into clearer focus. In addition, it's clear that we can later easily add in any disks we've held back as needed and use the reallocate command with the -f option to re-optimize the layout to accommodate the added disks efficiently. So it seems a bit short-sighted to configure all 80 of our disks into aggregates among our 6 nodes right from the get-go (minus the requisite hot spares, of course), as our vendor would have us do.
But in deciding how small to start out with, we don't want to cripple our performance too much. I've looked long and hard on the net to find some data, any data, on DOT 7.x performance vs raid group size, but have come up empty. I understand that performance should be awful and unacceptable if you have a raid group of size 3. I also understand from anecdotal evidence that performance improvements from higher raid group sizes are apparently are not significant once you get to a raid group size of 16 or so.
But what about in between? How does a graph of performance vs raid size look from 3 to, say, 20? Just ballpark data on any type of remotely typical workload would help a lot to start with. Has anyone ever seen or tried to compile this kind of data? Using iometer, perhaps, or any other benchmarking tool? RAID-DP data is preferable, but I'd take RAID-4 data if that's all I can get.
Don't really have the time to do testing myself.
Thanks
Randall Cotton
University of Illinois Foundation
_______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
On Sun, Oct 30, 2011 at 7:33 PM, Cotton, Randall <recotton@uif.uillinois.edu
wrote:
Well, from what I gather in searching for TR-3838, it’s not publicly available, so no, I haven’t even heard of it until now 8-)****
I’d love to see it, though.
It's a good one. Try requesting it from your SE.