I'm catching up on the 6-week-old thread on this topic, and wondered where anyone is with deployment, stability, etc.
It sounded like everyone in that thread went with my initial instinct of "make the aggregate as big as you can, and stuff it with flexvols". I'm wondering if that's the smart thing to do in a real-world scenario, or if there isn't some "middle way".
If three disk fail in any one RAID-DP in the aggregate, or if two disks fail and the operator accidentally yanks out a third disk while trying to yank one of the first two, or (insert nightmare scenario here), then it's tape restore time for *every flexvol in the aggregate*, isn't it? It's an extreme long shot with RAID-DP, but a very bad outcome if you hit that particular lottery.
I'm trying to decide how to think about that. Maybe divide up shares into different functional groups, or by space utilization, and do three aggregates instead of one? Still lots of space flexibility, but a bad raid only takes down a third of the Universe instead of the whole thing. Same issue in choosing the "sweet spot" for RAID size. I have 12 shelves in the FAS960, and I'm sure I want to minimize disks from the same RAID sharing shelves. One is ideal, two is tolerable, and three is "right out".
Thoughts from smart folks appreciated, especially smart folks with working implementations. ;-)
/// Rob
On Tue, Mar 22, 2005 at 01:44:59PM -0500, Rob Winters wrote:
I'm catching up on the 6-week-old thread on this topic, and wondered where anyone is with deployment, stability, etc.
It sounded like everyone in that thread went with my initial instinct of "make the aggregate as big as you can, and stuff it with flexvols". I'm wondering if that's the smart thing to do in a real-world scenario, or if there isn't some "middle way".
If three disk fail in any one RAID-DP in the aggregate, or if two disks fail and the operator accidentally yanks out a third disk while trying to yank one of the first two, or (insert nightmare scenario here), then it's tape restore time for *every flexvol in the aggregate*, isn't it? It's an extreme long shot with RAID-DP, but a very bad outcome if you hit that particular lottery.
Assuming you dont do any mirroring to another shelf or filer, I'd think so.
Thats a good point. Another that I thought of previously, is what if you want to retire a set of older shelves in the future. Flexibility is great for operation, but large aggregates might tempt the lazy into making one large aggregate and then you cannot remove any shelves without destroying the entire aggregate.
After thinking about it and discussing with some co-workers, I chose to make two aggregates, one with two 36G shelves and a 72G shelf, and the other aggr with a 72G and 144G. Each has several flexvols inside and benefit from the flexiblity and performance of having data spread over multiple shelves, and the data on the 36+36+72 aggregate will most likely fit onto future new additional shelves by the time we want to retire the 36's.
As an aside, it would be nice if we could grow a raid group by replacing smaller disks by one with larger and letting WAFL fill out the unused portions of a complete set of new disks. Also, in my dreams and wishes, if its possible to do a wafl reallocate to spread data among disks efficiently, I don't know why it would be so hard to respread data onto less disks with the intention of removing disks from a raid group to shrink it.
I'm trying to decide how to think about that. Maybe divide up shares into different functional groups, or by space utilization, and do three aggregates instead of one? Still lots of space flexibility, but a bad raid only takes down a third of the Universe instead of the whole thing. Same issue in choosing the "sweet spot" for RAID size. I have 12 shelves in the FAS960, and I'm sure I want to minimize disks from the same RAID sharing shelves. One is ideal, two is tolerable, and three is "right out".
Thoughts from smart folks appreciated, especially smart folks with working implementations. ;-)
/// Rob