Best practice (based on my reading of the archives) seems to be to distribute disk membership in an aggregate across disk shelves.
This would appear to be for performance reasons primarily (less chance of saturating a shelf's "uplink" to the controller), but how does it affect reliability?
If I limit myself to one aggregate per shelf, if I lose that shelf I lose only the one aggregate. If aggregates are distributed I could lose all of them.
My thought is that the chance of the shelf failing is actually pretty slim as its hardware isn't all that sophisticated.
And obviously there are performance penalties for limiting to one aggregate per shelf (disk count maximums).
Thanks, Ray
Back in the R100 days it was *required* to spread the volumes, in a very specific order, vertically across shelves (volumes at that time were the aggregates of today). The logic is still sound from a performance view, but keep in mind that if your spares are spread randomly every time you lose a drive your perfect drive configuration degrades.
However, look at your current IO requirements and decide. Shelves today have 24 15k drives in them with 3Gb or more on the loop. You need an awful lot of to saturate that. Certainly it can be done but if you're filer is generally hovering around 20k ops or a couple hundred MB throughput, it's unlikely you're going to saturate any single new shelf. Let alone if that IO is over multiple shelves.
If you're talking about DS14's, the numbers drop notably of course but the same logic applies. Maybe with those numbers drop to 8k ops or 50MB throughput. Not sure.
With multi-pathing the odds of losing a loop completely are very, very low. Personally I just let WAFL decide where to grab drives from; haven't cared about that level of control for a long time now...
Jeff Kennedy Qualcomm, Incorporated QCT Engineering Compute 858-651-6592
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Ray Van Dolson Sent: Thursday, April 07, 2011 12:58 PM To: toasters@mathworks.com Subject: Distribute aggregate across shelves or limit to one shelf?
Best practice (based on my reading of the archives) seems to be to distribute disk membership in an aggregate across disk shelves.
This would appear to be for performance reasons primarily (less chance of saturating a shelf's "uplink" to the controller), but how does it affect reliability?
If I limit myself to one aggregate per shelf, if I lose that shelf I lose only the one aggregate. If aggregates are distributed I could lose all of them.
My thought is that the chance of the shelf failing is actually pretty slim as its hardware isn't all that sophisticated.
And obviously there are performance penalties for limiting to one aggregate per shelf (disk count maximums).
Thanks, Ray
Whatever you do, there is always a tradeoff between reliability, performance and efficiency. I think all concerns are well answered in NetApps storage resiliency paper: Do Raid-DP, do backups, do HA, do multi-pathing, do disk auto assign, have spare parts (etc) and you'll have 99,999% availability and a max of performance, as long as you can afford it.
And btw., limiting you and your system to have 1 aggregate/shelf produces a lot of work reassigning disks/spares over time when your systems grows. After 10+ years of operating netapp systems, I always had my aggregates (volumes earlier) spread across all shelves and never had any trouble with it.
-SF
Am 07.04.2011 21:58, schrieb Ray Van Dolson:
Best practice (based on my reading of the archives) seems to be to distribute disk membership in an aggregate across disk shelves.
This would appear to be for performance reasons primarily (less chance of saturating a shelf's "uplink" to the controller), but how does it affect reliability?
If I limit myself to one aggregate per shelf, if I lose that shelf I lose only the one aggregate. If aggregates are distributed I could lose all of them.
My thought is that the chance of the shelf failing is actually pretty slim as its hardware isn't all that sophisticated.
And obviously there are performance penalties for limiting to one aggregate per shelf (disk count maximums).
I'd second Stefan's comment. User error is generally the cause of downtime, not hardware outages (assuming a reasonably architected sytem from a redundancy standpoint). Complicating things generally makes you more prone to outages, not less.
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Stefan Funke Sent: Friday, April 08, 2011 2:18 AM To: toasters@mathworks.com Subject: Re: Distribute aggregate across shelves or limit to one shelf?
Whatever you do, there is always a tradeoff between reliability, performance and efficiency. I think all concerns are well answered in NetApps storage resiliency paper: Do Raid-DP, do backups, do HA, do multi-pathing, do disk auto assign, have spare parts (etc) and you'll have 99,999% availability and a max of performance, as long as you can afford it.
And btw., limiting you and your system to have 1 aggregate/shelf produces a lot of work reassigning disks/spares over time when your systems grows. After 10+ years of operating netapp systems, I always had my aggregates (volumes earlier) spread across all shelves and never had any trouble with it.
-SF
Am 07.04.2011 21:58, schrieb Ray Van Dolson:
Best practice (based on my reading of the archives) seems to be to distribute disk membership in an aggregate across disk shelves.
This would appear to be for performance reasons primarily (less chance
of saturating a shelf's "uplink" to the controller), but how does it affect reliability?
If I limit myself to one aggregate per shelf, if I lose that shelf I lose only the one aggregate. If aggregates are distributed I could lose all of them.
My thought is that the chance of the shelf failing is actually pretty slim as its hardware isn't all that sophisticated.
And obviously there are performance penalties for limiting to one aggregate per shelf (disk count maximums).
Please be advised that this email may contain confidential information. If you are not the intended recipient, please notify us by email by replying to the sender and delete this message. The sender disclaims that the content of this email constitutes an offer to enter into, or the acceptance of, any agreement; provided that the foregoing does not invalidate the binding effect of any digital or other electronic reproduction of a manual signature that is included in any attachment.
Definitely want to spread across multiple shelves if possible. Not only does this give you better performance as the workload is distributed across the shelves, but it does actually give you better protection in the event of a shelf loss. We had a shelf failure on a system that had 10 aggregates distributed across 12 shelves. The system panic'd and rebooted and performance was horrible as it saw several aggregates with double drive failures and tried to rebuild them (ran out of online spares in the process) but we lost no data at all. NetApp's support was great about getting additional spare drives and a replacement shelf quickly. It took a couple of days for performance to return to 100% as it continued rebuilding the aggregates, but we lost no data and kept the system online during the rebuild after replacing the shelf.
This email is UNCLASSIFIED
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Ray Van Dolson Sent: Thursday, April 07, 2011 3:58 PM To: toasters@mathworks.com Subject: Distribute aggregate across shelves or limit to one shelf?
Best practice (based on my reading of the archives) seems to be to distribute disk membership in an aggregate across disk shelves.
This would appear to be for performance reasons primarily (less chance of saturating a shelf's "uplink" to the controller), but how does it affect reliability?
If I limit myself to one aggregate per shelf, if I lose that shelf I lose only the one aggregate. If aggregates are distributed I could lose all of them.
My thought is that the chance of the shelf failing is actually pretty slim as its hardware isn't all that sophisticated.
And obviously there are performance penalties for limiting to one aggregate per shelf (disk count maximums).
Thanks, Ray