Greetings:
I am new to the NetApp world and am looking for some advice/guidance. We purchased a NetApp 3020i with two shelves (10K 144GB FC drives) a few months ago. We have now placed an order for 4 additional shelves (fully loaded) with 10K 144GB drives.
The current configuration is one large RAID DP Group with 1 HotSpare (27 Disk DP RAID Group), which belong to one aggregate (aggr0).
My question:
1. The new enclosures, should I create 2 x 28 Disk DP RAID groups and add it to the existing aggregate? The application we run is IO intensive and more spindles the better. Does anyone see any disadvantages of doing the above? 2. Does anyone have a formula to calculate mean time to Repair on the 10K 144GB disks on a busy filer? I only have one hot spare disk, should I add more, is there a rule of thumb? 3. We currently have multiple volumes on the filer, after adding the new shelves/spindles, if I run a reallocate on the existing volumes, would it take advantage of the additional spindles.
Any help/guidance greatly appreciated.
Thanks, Darish.
Hey Darish,
On 3/17/07, Darish Rajanayagam darishr@softcom.biz wrote:
The current configuration is one large RAID DP Group with 1 HotSpare (27 Disk DP RAID Group), which belong to one aggregate (aggr0).
My question:
The new enclosures, should I create 2 x 28 Disk DP RAID groups and add it to the existing aggregate? The application we run is IO intensive and more spindles the better. Does anyone see any disadvantages of doing the above?
The performance of a raidgroup depends on the number of spindles in the raidgroup, and in my experience not on the amount of rg's in an aggr. In my experience, the most applications perform best with about 22 - 25 disks / rg, after that the performance doesn't seem to increase that much anymore.
If you need to serve the data out of one volume (mountpoint/share/lun), you have no option but to expand your aggr, and in general that is something that works best. You can create more aggregates however, this gives you more future flexibility since you cannot destroy raidgroups, but only whole aggregates.
Does anyone have a formula to calculate mean time to Repair on the 10K 144GB disks on a busy filer? I only have one hot spare disk, should I add more, is there a rule of thumb?
I generally use one hotspare per set of 40-60 disks, depending on the need for redundancy. With DP on FC you are reasonably safe, so I'd opt of 1 spare per 60 disks and round that number up.
I have seen reconstruct times of roughly 3-4h for a 10K 144GB disk, this offcourse depends a lot on the load of your box. You can always tune the reconstruct priorities. Also, don't forget you have RAID-DP so you are protected against double-disk failure within a raidgroup.
We currently have multiple volumes on the filer, after adding the new shelves/spindles, if I run a reallocate on the existing volumes, would it take advantage of the additional spindles.
Only if you expanded the aggregate where the volume resides in. And yes, if remotely possibly, try to reallocate if you grow a busy aggregate. You can also schedule regular reallocate jobs using the "reallocate schedule" command, which IIRC appeared in 7.0.5.
A scheduled reallocation job will first scan if the reallocation is needed and only then fire off a wafl reallocate.
Gr,
Nils
On the rebuild times, in 7.2 (I think it started there) you get rapid disk rebuild ( or something - it was called sick disk copy, part of maintence garage), which is much faster then a full rebuild. It copys the good parts of the failed disk, disk to disk. Then it rebuild from parity only what needs to be.
-Blake
On 3/17/07, Nils Vogels bacardicoke@gmail.com wrote:
Hey Darish,
On 3/17/07, Darish Rajanayagam darishr@softcom.biz wrote:
The current configuration is one large RAID DP Group with 1 HotSpare (27 Disk DP RAID Group), which belong to one aggregate (aggr0).
My question:
The new enclosures, should I create 2 x 28 Disk DP RAID groups and add it
to
the existing aggregate? The application we run is IO intensive and more spindles the better. Does anyone see any disadvantages of doing the above?
The performance of a raidgroup depends on the number of spindles in the raidgroup, and in my experience not on the amount of rg's in an aggr. In my experience, the most applications perform best with about 22 - 25 disks / rg, after that the performance doesn't seem to increase that much anymore.
If you need to serve the data out of one volume (mountpoint/share/lun), you have no option but to expand your aggr, and in general that is something that works best. You can create more aggregates however, this gives you more future flexibility since you cannot destroy raidgroups, but only whole aggregates.
Does anyone have a formula to calculate mean time to Repair on the 10K
144GB
disks on a busy filer? I only have one hot spare disk, should I add more,
is
there a rule of thumb?
I generally use one hotspare per set of 40-60 disks, depending on the need for redundancy. With DP on FC you are reasonably safe, so I'd opt of 1 spare per 60 disks and round that number up.
I have seen reconstruct times of roughly 3-4h for a 10K 144GB disk, this offcourse depends a lot on the load of your box. You can always tune the reconstruct priorities. Also, don't forget you have RAID-DP so you are protected against double-disk failure within a raidgroup.
We currently have multiple volumes on the filer, after adding the new shelves/spindles, if I run a reallocate on the existing volumes, would it take advantage of the additional spindles.
Only if you expanded the aggregate where the volume resides in. And yes, if remotely possibly, try to reallocate if you grow a busy aggregate. You can also schedule regular reallocate jobs using the "reallocate schedule" command, which IIRC appeared in 7.0.5.
A scheduled reallocation job will first scan if the reallocation is needed and only then fire off a wafl reallocate.
Gr,
Nils
-- Simple guidelines to happiness: Work like you don't need the money, Love like your heart has never been broken and Dance like no one can see you.
netapp had recommended to us a rg size of 16 with raid_dp, will we get better performance out of a 22-25 disk rg or is that for specific applications?
-- Daniel Leeds Senior Systems Administrator Edmunds.com
-----Original Message----- From: owner-toasters@mathworks.com on behalf of Nils Vogels Sent: Sat 3/17/2007 3:10 PM To: Darish Rajanayagam Cc: toasters@mathworks.com Subject: Re: FAS3020 - aggr best practises.
Hey Darish,
On 3/17/07, Darish Rajanayagam darishr@softcom.biz wrote:
The current configuration is one large RAID DP Group with 1 HotSpare (27 Disk DP RAID Group), which belong to one aggregate (aggr0).
My question:
The new enclosures, should I create 2 x 28 Disk DP RAID groups and add it to the existing aggregate? The application we run is IO intensive and more spindles the better. Does anyone see any disadvantages of doing the above?
The performance of a raidgroup depends on the number of spindles in the raidgroup, and in my experience not on the amount of rg's in an aggr. In my experience, the most applications perform best with about 22 - 25 disks / rg, after that the performance doesn't seem to increase that much anymore.
If you need to serve the data out of one volume (mountpoint/share/lun), you have no option but to expand your aggr, and in general that is something that works best. You can create more aggregates however, this gives you more future flexibility since you cannot destroy raidgroups, but only whole aggregates.
Does anyone have a formula to calculate mean time to Repair on the 10K 144GB disks on a busy filer? I only have one hot spare disk, should I add more, is there a rule of thumb?
I generally use one hotspare per set of 40-60 disks, depending on the need for redundancy. With DP on FC you are reasonably safe, so I'd opt of 1 spare per 60 disks and round that number up.
I have seen reconstruct times of roughly 3-4h for a 10K 144GB disk, this offcourse depends a lot on the load of your box. You can always tune the reconstruct priorities. Also, don't forget you have RAID-DP so you are protected against double-disk failure within a raidgroup.
We currently have multiple volumes on the filer, after adding the new shelves/spindles, if I run a reallocate on the existing volumes, would it take advantage of the additional spindles.
Only if you expanded the aggregate where the volume resides in. And yes, if remotely possibly, try to reallocate if you grow a busy aggregate. You can also schedule regular reallocate jobs using the "reallocate schedule" command, which IIRC appeared in 7.0.5.
A scheduled reallocation job will first scan if the reallocation is needed and only then fire off a wafl reallocate.
Gr,
Nils
It's more of a 'it depends' kind of thing. If you look at statit do you see your disks doing more then 100 iops (120 being the around the limit for a 10k rpm disk)? Or short chain lengths? It'll mostly depend on your workload. You can send a statit if you'd like help with that.
-Blake
On 3/17/07, Leeds, Daniel dleeds@edmunds.com wrote:
netapp had recommended to us a rg size of 16 with raid_dp, will we get better performance out of a 22-25 disk rg or is that for specific applications?
-- Daniel Leeds Senior Systems Administrator Edmunds.com
-----Original Message----- From: owner-toasters@mathworks.com on behalf of Nils Vogels Sent: Sat 3/17/2007 3:10 PM To: Darish Rajanayagam Cc: toasters@mathworks.com Subject: Re: FAS3020 - aggr best practises.
Hey Darish,
On 3/17/07, Darish Rajanayagam darishr@softcom.biz wrote:
The current configuration is one large RAID DP Group with 1 HotSpare (27 Disk DP RAID Group), which belong to one aggregate (aggr0).
My question:
The new enclosures, should I create 2 x 28 Disk DP RAID groups and add it
to
the existing aggregate? The application we run is IO intensive and more spindles the better. Does anyone see any disadvantages of doing the above?
The performance of a raidgroup depends on the number of spindles in the raidgroup, and in my experience not on the amount of rg's in an aggr. In my experience, the most applications perform best with about 22 - 25 disks / rg, after that the performance doesn't seem to increase that much anymore.
If you need to serve the data out of one volume (mountpoint/share/lun), you have no option but to expand your aggr, and in general that is something that works best. You can create more aggregates however, this gives you more future flexibility since you cannot destroy raidgroups, but only whole aggregates.
Does anyone have a formula to calculate mean time to Repair on the 10K
144GB
disks on a busy filer? I only have one hot spare disk, should I add more,
is
there a rule of thumb?
I generally use one hotspare per set of 40-60 disks, depending on the need for redundancy. With DP on FC you are reasonably safe, so I'd opt of 1 spare per 60 disks and round that number up.
I have seen reconstruct times of roughly 3-4h for a 10K 144GB disk, this offcourse depends a lot on the load of your box. You can always tune the reconstruct priorities. Also, don't forget you have RAID-DP so you are protected against double-disk failure within a raidgroup.
We currently have multiple volumes on the filer, after adding the new shelves/spindles, if I run a reallocate on the existing volumes, would it take advantage of the additional spindles.
Only if you expanded the aggregate where the volume resides in. And yes, if remotely possibly, try to reallocate if you grow a busy aggregate. You can also schedule regular reallocate jobs using the "reallocate schedule" command, which IIRC appeared in 7.0.5.
A scheduled reallocation job will first scan if the reallocation is needed and only then fire off a wafl reallocate.
Gr,
Nils
-- Simple guidelines to happiness: Work like you don't need the money, Love like your heart has never been broken and Dance like no one can see you.
Thank you all for your feedback.
I have run perfstat on my system, at most times on the existing two enclosures, we are around 80IOPS, however at times, we shoot up to 170IOPS and the application comes to a crawl, hence why we are adding more spindles - along with bringing more applications onto the NetApp.
Nils, I will have multiple volumes - we will be serving out data via LUNs/iscsi, NFS and CIFs - my initial thought was to create raidgroups/aggregates based on Protocols - NFS, iscsi, etc., however, this will fragment my storage and by creating multiple raid groups, loose more disks to parity. This would also limit the number of spindles on some protocols and possibly cause performance issues later on.
Peter, thanks for your feedback as well, I am working with our NetApp partner SE as well to get some best practices guidance.
Thanks again, it's nice to have insight, knowledge and the ability to discuss implementation best practices from experienced NetAppers!
Darish.
-----Original Message----- From: Blake Golliher [mailto:thelastman@gmail.com] Sent: Saturday, March 17, 2007 9:46 PM To: Leeds, Daniel; Nils Vogels; Darish Rajanayagam; toasters@mathworks.com Subject: Re: FAS3020 - aggr best practises.
It's more of a 'it depends' kind of thing. If you look at statit do you see your disks doing more then 100 iops (120 being the around the limit for a 10k rpm disk)? Or short chain lengths? It'll mostly depend on your workload. You can send a statit if you'd like help with that.
-Blake
On 3/17/07, Leeds, Daniel dleeds@edmunds.com wrote:
netapp had recommended to us a rg size of 16 with raid_dp, will we get better performance out of a 22-25 disk rg or is that for specific applications?
-- Daniel Leeds Senior Systems Administrator Edmunds.com
-----Original Message----- From: owner-toasters@mathworks.com on behalf of Nils Vogels Sent: Sat 3/17/2007 3:10 PM To: Darish Rajanayagam Cc: toasters@mathworks.com Subject: Re: FAS3020 - aggr best practises.
Hey Darish,
On 3/17/07, Darish Rajanayagam darishr@softcom.biz wrote:
The current configuration is one large RAID DP Group with 1 HotSpare (27 Disk DP RAID Group), which belong to one aggregate (aggr0).
My question:
The new enclosures, should I create 2 x 28 Disk DP RAID groups and add it
to
the existing aggregate? The application we run is IO intensive and more spindles the better. Does anyone see any disadvantages of doing the above?
The performance of a raidgroup depends on the number of spindles in the raidgroup, and in my experience not on the amount of rg's in an aggr. In my experience, the most applications perform best with about 22 - 25 disks / rg, after that the performance doesn't seem to increase that much anymore.
If you need to serve the data out of one volume (mountpoint/share/lun), you have no option but to expand your aggr, and in general that is something that works best. You can create more aggregates however, this gives you more future flexibility since you cannot destroy raidgroups, but only whole aggregates.
Does anyone have a formula to calculate mean time to Repair on the 10K
144GB
disks on a busy filer? I only have one hot spare disk, should I add more,
is
there a rule of thumb?
I generally use one hotspare per set of 40-60 disks, depending on the need for redundancy. With DP on FC you are reasonably safe, so I'd opt of 1 spare per 60 disks and round that number up.
I have seen reconstruct times of roughly 3-4h for a 10K 144GB disk, this offcourse depends a lot on the load of your box. You can always tune the reconstruct priorities. Also, don't forget you have RAID-DP so you are protected against double-disk failure within a raidgroup.
We currently have multiple volumes on the filer, after adding the new shelves/spindles, if I run a reallocate on the existing volumes, would it take advantage of the additional spindles.
Only if you expanded the aggregate where the volume resides in. And yes, if remotely possibly, try to reallocate if you grow a busy aggregate. You can also schedule regular reallocate jobs using the "reallocate schedule" command, which IIRC appeared in 7.0.5.
A scheduled reallocation job will first scan if the reallocation is needed and only then fire off a wafl reallocate.
Gr,
Nils
-- Simple guidelines to happiness: Work like you don't need the money, Love like your heart has never been broken and Dance like no one can see you.
On 3/19/07, Darish Rajanayagam darishr@softcom.biz wrote:
Nils, I will have multiple volumes - we will be serving out data via LUNs/iscsi, NFS and CIFs - > my initial thought was to create raidgroups/aggregates based on Protocols - NFS, iscsi, etc., however, this will fragment my storage and by creating multiple raid groups, loose more disks to parity. This would also limit the number of spindles on some protocols and possibly cause performance issues later on.
It is generally a good idea to split at least blockbased IO (iSCSI/FCP) from filebased IO (CIFS/NFS) on the aggr level, since they use NetApp's resources in a different way and might actually interfere in some cases.
Gr,
Nils
Hi Darish Welcome to NetApp and to Toasters!
1. You can do an "aggr add aggr0 56" or use the FilerView GUI and add all 56 of the new disks into the existing aggregate. You can physically add the shelves and add them to the aggr while the filer is up and running. I see no disadvantages, and that is the best practice. (Add disks in large sets, ideally the raid group size).
2. Rebuild times vary depending on load. You can adjust the priority. This is less of an issue on a filer using RAID-DP, because you are still protected if you lose a second disk in the same raid group. Having one hot spare for this number of disks is fine, again since you have RAID-DP. Some people may have different opinions.
3. Even if you do nothing, WAFL will spread writes across all spindles, even for existing volumes. You can also reallocate, but much of that will just happen over time. Reallocate will usually help with reads.
Feel free to ask your questions here, but you should also be able to get these answered from your NetApp or partner SE - that's what we get paid for!
Share and enjoy!
Peter
________________________________
From: Darish Rajanayagam [mailto:darishr@softcom.biz] Sent: Saturday, March 17, 2007 9:59 AM To: 'toasters@mathworks.com' Subject: FAS3020 - aggr best practises.
Greetings:
I am new to the NetApp world and am looking for some advice/guidance. We purchased a NetApp 3020i with two shelves (10K 144GB FC drives) a few months ago. We have now placed an order for 4 additional shelves (fully loaded) with 10K 144GB drives.
The current configuration is one large RAID DP Group with 1 HotSpare (27 Disk DP RAID Group), which belong to one aggregate (aggr0).
My question:
1. The new enclosures, should I create 2 x 28 Disk DP RAID groups and add it to the existing aggregate? The application we run is IO intensive and more spindles the better. Does anyone see any disadvantages of doing the above? 2. Does anyone have a formula to calculate mean time to Repair on the 10K 144GB disks on a busy filer? I only have one hot spare disk, should I add more, is there a rule of thumb? 3. We currently have multiple volumes on the filer, after adding the new shelves/spindles, if I run a reallocate on the existing volumes, would it take advantage of the additional spindles.
Any help/guidance greatly appreciated.
Thanks, Darish.
Point one reminds me of a question I've been pondering. I've got a filer using RAID-DP with 3X14-disk raid groups, and currently 7 spares. Ideally I'd only keep two spares, but I'm still not clear on the pros/cons of adding a 5 disk raid group, effectively only adding three more data disks to the volume.
I'm not in a space crunch currently but I certainly will be at some point. Is it best to leave so many extra spares until I can add a full raid group all at once? Or in my case is it not that important?