Has anyone ever completely, completely (like with luns and overallocation) filled an aggregate? Were you able to reuse the aggregate again once you reduced the space (presumably by shrinking or deleting a volume or two)? Assume this is not the root aggregate.
Putting aside how bad it is, how all the vols will fill and luns will go offline, how the performance will never be right, fragmentation, etc, this is recoverable, right?
I'm testing a competitor's product- known for thin provisioning- and it completely went off the rails after I did this accidentally. I mean... Delete everything, start over, and a big fat "don't do that then" from support.
TIA,
Fred
On 2015-11-5 13:54 , Fred Grieco wrote:
Has anyone ever completely, completely (like with luns and overallocation) filled an aggregate? Were you able to reuse the aggregate again once you reduced the space (presumably by shrinking or deleting a volume or two)? Assume this is not the root aggregate.
Back in the days of traditional volumes being your only option, I think I once did fill a volume. Apart from the serious nose-dive the performance takes when you are even getting near that limit (like 85% of 90%, and the filer starts playing tetris continuously), and the hassle it is to actually free data (with snapshots taking up everything that's deleted), we recovered from that without any issues (if we ever actually came to 100% - i can't recall - performance was bad enough that some action had to be taken anyway).
But as said, that was before aggregates were invented, so YMMV. And NFS-only, no LUNs.
Yea, we don't have that problem. Fill your aggr up? Cool. Shrink the volume. Turn off space guarantees. Delete volumes. Whatever. :)
Used to get the "I used up my space" call all the time when I was in support. Always had a (mostly) happy ending.
From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Fred Grieco Sent: Thursday, November 05, 2015 7:55 AM To: Toasters Subject: Completely filling and aggregate
Has anyone ever completely, completely (like with luns and overallocation) filled an aggregate? Were you able to reuse the aggregate again once you reduced the space (presumably by shrinking or deleting a volume or two)? Assume this is not the root aggregate.
Putting aside how bad it is, how all the vols will fill and luns will go offline, how the performance will never be right, fragmentation, etc, this is recoverable, right?
I'm testing a competitor's product- known for thin provisioning- and it completely went off the rails after I did this accidentally. I mean... Delete everything, start over, and a big fat "don't do that then" from support.
TIA, Fred
The rule I usually use is this:
1) Free space means all the space that is not used by actual data. For most customers, that means adding up free space in the aggregate, free space in the volumes, and unused space in various LUNs and you have your total.
2) Keep capacity at 85% or lower and you should have no worries.
3) As you approach 90%, it's time to start thinking about cleanup or getting more storage.
4) You'll probably start seeing slowdowns as approach 95%.
What you put inside doesn't really matter. It could be LUNs, could be a ton of flexclones, and you might overprovision by 500% in theory. The provisioning is just a matter of accounting. If you make a 1TB LUN you haven't done anything other than reserve the right to 1TB worth of blocks on the aggregate. Until you actually write data and use those blocks, the aggregate doesn't care about that reservation.
I presented at Insight and one of the recommendations I made was embracing thin provisioning across the board. Thin provision the LUNs, the clones, the volumes. Everything, and then just pay attention to the reported free space on the aggregate. It's just easier all around. A storage array is no longer a big box of disks they way it used to be. The best way to get value out of an array is to just think of it as one giant data repository. Thinking of a LUN or a file as something tangible is an illusion.
From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Parisi, Justin Sent: Thursday, November 05, 2015 3:26 PM To: Fred Grieco; Toasters Subject: RE: Completely filling and aggregate
Yea, we don't have that problem. Fill your aggr up? Cool. Shrink the volume. Turn off space guarantees. Delete volumes. Whatever. :)
Used to get the "I used up my space" call all the time when I was in support. Always had a (mostly) happy ending.
From: toasters-bounces@teaparty.netmailto:toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Fred Grieco Sent: Thursday, November 05, 2015 7:55 AM To: Toasters Subject: Completely filling and aggregate
Has anyone ever completely, completely (like with luns and overallocation) filled an aggregate? Were you able to reuse the aggregate again once you reduced the space (presumably by shrinking or deleting a volume or two)? Assume this is not the root aggregate.
Putting aside how bad it is, how all the vols will fill and luns will go offline, how the performance will never be right, fragmentation, etc, this is recoverable, right?
I'm testing a competitor's product- known for thin provisioning- and it completely went off the rails after I did this accidentally. I mean... Delete everything, start over, and a big fat "don't do that then" from support.
TIA, Fred
My input on this..is knowing how fast you can BUY and install hardware, will determine how much headroom you need to leave in place below 95%..based on your fill rate.
If it's gonna take you 3mo to go from quote to racked, and your fill rate is 5% per month, manage your expectations accordingly. _________________________________Jeff MohlerTech Yahoo, Storage Architect, Principal(831)454-6712 YPAC Gold Member Twitter: @PrincipalYahoo CorpIM: Hipchat & Iris
On Thursday, November 5, 2015 6:52 AM, "Steiner, Jeffrey" Jeffrey.Steiner@netapp.com wrote:
#yiv9107741781 #yiv9107741781 -- _filtered #yiv9107741781 {font-family:Wingdings;panose-1:5 0 0 0 0 0 0 0 0 0;} _filtered #yiv9107741781 {panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv9107741781 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv9107741781 {font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4 2 4;}#yiv9107741781 #yiv9107741781 p.yiv9107741781MsoNormal, #yiv9107741781 li.yiv9107741781MsoNormal, #yiv9107741781 div.yiv9107741781MsoNormal {margin:0in;margin-bottom:.0001pt;font-size:12.0pt;}#yiv9107741781 a:link, #yiv9107741781 span.yiv9107741781MsoHyperlink {color:blue;text-decoration:underline;}#yiv9107741781 a:visited, #yiv9107741781 span.yiv9107741781MsoHyperlinkFollowed {color:purple;text-decoration:underline;}#yiv9107741781 p.yiv9107741781MsoAcetate, #yiv9107741781 li.yiv9107741781MsoAcetate, #yiv9107741781 div.yiv9107741781MsoAcetate {margin:0in;margin-bottom:.0001pt;font-size:8.0pt;}#yiv9107741781 p.yiv9107741781MsoListParagraph, #yiv9107741781 li.yiv9107741781MsoListParagraph, #yiv9107741781 div.yiv9107741781MsoListParagraph {margin-top:0in;margin-right:0in;margin-bottom:0in;margin-left:.5in;margin-bottom:.0001pt;font-size:12.0pt;}#yiv9107741781 span.yiv9107741781EmailStyle17 {color:#1F497D;}#yiv9107741781 span.yiv9107741781EmailStyle18 {color:#1F497D;}#yiv9107741781 span.yiv9107741781BalloonTextChar {}#yiv9107741781 .yiv9107741781MsoChpDefault {font-size:10.0pt;} _filtered #yiv9107741781 {margin:1.0in 1.0in 1.0in 1.0in;}#yiv9107741781 div.yiv9107741781WordSection1 {}#yiv9107741781 _filtered #yiv9107741781 {} _filtered #yiv9107741781 {} _filtered #yiv9107741781 {} _filtered #yiv9107741781 {} _filtered #yiv9107741781 {} _filtered #yiv9107741781 {} _filtered #yiv9107741781 {} _filtered #yiv9107741781 {} _filtered #yiv9107741781 {} _filtered #yiv9107741781 {}#yiv9107741781 ol {margin-bottom:0in;}#yiv9107741781 ul {margin-bottom:0in;}#yiv9107741781 The rule I usually use is this: 1) Free space means all the space that is not used by actual data. For most customers, that means adding up free space in the aggregate, free space in the volumes, and unused space in various LUNs and you have your total. 2) Keep capacity at 85% or lower and you should have no worries. 3) As you approach 90%, it’s time to start thinking about cleanup or getting more storage. 4) You’ll probably start seeing slowdowns as approach 95%. What you put inside doesn’t really matter. It could be LUNs, could be a ton of flexclones, and you might overprovision by 500% in theory. The provisioning is just a matter of accounting. If you make a 1TB LUN you haven’t done anything other than reserve the right to 1TB worth of blocks on the aggregate. Until you actually write data and use those blocks, the aggregate doesn’t care about that reservation. I presented at Insight and one of the recommendations I made was embracing thin provisioning across the board. Thin provision the LUNs, the clones, the volumes. Everything, and then just pay attention to the reported free space on the aggregate. It’s just easier all around. A storage array is no longer a big box of disks they way it used to be. The best way to get value out of an array is to just think of it as one giant data repository. Thinking of a LUN or a file as something tangible is an illusion. From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net]On Behalf Of Parisi, Justin Sent: Thursday, November 05, 2015 3:26 PM To: Fred Grieco; Toasters Subject: RE: Completely filling and aggregate Yea, we don’t have that problem. Fill your aggr up? Cool. Shrink the volume. Turn off space guarantees. Delete volumes. Whatever.J Used to get the “I used up my space” call all the time when I was in support. Always had a (mostly) happy ending. From:toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net]On Behalf Of Fred Grieco Sent: Thursday, November 05, 2015 7:55 AM To: Toasters Subject: Completely filling and aggregate | Has anyone ever completely, completely (like with luns and overallocation) filled an aggregate? Were you able to reuse the aggregate again once you reduced the space (presumably by shrinking or deleting a volume or two)? Assume this is not the root aggregate. Putting aside how bad it is, how all the vols will fill and luns will go offline, how the performance will never be right, fragmentation, etc, this is recoverable, right? I'm testing a competitor's product- known for thin provisioning- and it completely went off the rails after I did this accidentally. I mean... Delete everything, start over, and a big fat "don't do that then" from support. TIA, Fred |
_______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Jeffrey Steiner wrote:
The rule I usually use is this: [...]
You'll probably start seeing slowdowns as approach 95%.
That threshold depends on the workload -- among a bunch of other things the level of random overwrites -- if you're using dedup or not, etc. Baiscally: what is the workload doing to your free space in the Aggr, and can free_space_realloc [on | no_redirect] hold it nice and clean? If not...
More often than not you'll see slowdown, especially for W (higher latency, and/or spikes) long before you reach 95%. More like 85+ somewhere, I'd say. Sure, I have very heavy nasty NFSv3 workload here, most ppl prob will never see such stuff, but >90% here is a really really bad idea
Need to aim for having it around 80% max. Trying to do a reallocate -A with less than that avail in an Aggr isn't pleasant trust me
/M
I would expect that after filling an aggregate your performance deficit would go away one you frees up the blocks unless you happen to free up blocks that happened to be a huge hot spot.
Think of it this way. The reasons for the slowdown when the aggregate is near full is because wafl does not have enough contiguous free space to write a chain of contiguous blocks to each disk. It instead has to hold back the CP, read the closest thing it can find to a contiguous group of chains onto memory fill in the holes with what it wanted to write , and then put the tetris back down to disks. These are know as cp_reads and unless write I/o is very low will cause back to back Cp's which translates to high write latency and angry users. ________________________________________ From: toasters-bounces@teaparty.net [toasters-bounces@teaparty.net] on behalf of Michael Bergman [michael.bergman@ericsson.com] Sent: Thursday, November 05, 2015 6:06 PM To: Toasters Subject: Re: Completely filling and aggregate
Jeffrey Steiner wrote:
The rule I usually use is this: [...]
You'll probably start seeing slowdowns as approach 95%.
That threshold depends on the workload -- among a bunch of other things the level of random overwrites -- if you're using dedup or not, etc. Baiscally: what is the workload doing to your free space in the Aggr, and can free_space_realloc [on | no_redirect] hold it nice and clean? If not...
More often than not you'll see slowdown, especially for W (higher latency, and/or spikes) long before you reach 95%. More like 85+ somewhere, I'd say. Sure, I have very heavy nasty NFSv3 workload here, most ppl prob will never see such stuff, but >90% here is a really really bad idea
Need to aim for having it around 80% max. Trying to do a reallocate -A with less than that avail in an Aggr isn't pleasant trust me
/M
_______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Agreed, the threshholds definitely depend on the workload. I've got one banking customer with a database environment pushing 400MB/sec of redo logging where every microsecond of latency counts. They keep the capacity capped at 85% to avoid problems.
On the other hand, when I worked at my prior employer, a well-known database and application company located somewhere near Palo Alto, we had NetApp systems that didn't even have a problem at 98% capacity because the workloads were almost entirely random reads.
-----Original Message----- From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Michael Bergman Sent: Friday, November 06, 2015 12:06 AM To: Toasters Subject: Re: Completely filling and aggregate
Jeffrey Steiner wrote:
The rule I usually use is this: [...]
You'll probably start seeing slowdowns as approach 95%.
That threshold depends on the workload -- among a bunch of other things the level of random overwrites -- if you're using dedup or not, etc. Baiscally: what is the workload doing to your free space in the Aggr, and can free_space_realloc [on | no_redirect] hold it nice and clean? If not...
More often than not you'll see slowdown, especially for W (higher latency, and/or spikes) long before you reach 95%. More like 85+ somewhere, I'd say. Sure, I have very heavy nasty NFSv3 workload here, most ppl prob will never see such stuff, but >90% here is a really really bad idea
Need to aim for having it around 80% max. Trying to do a reallocate -A with less than that avail in an Aggr isn't pleasant trust me
/M
_______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Thanks for the great discussion everyone!
From: "Steiner, Jeffrey" Jeffrey.Steiner@netapp.com To: Michael Bergman michael.bergman@ericsson.com; Toasters toasters@teaparty.net Sent: Friday, November 6, 2015 2:38 AM Subject: RE: Completely filling and aggregate
Agreed, the threshholds definitely depend on the workload. I've got one banking customer with a database environment pushing 400MB/sec of redo logging where every microsecond of latency counts. They keep the capacity capped at 85% to avoid problems.
On the other hand, when I worked at my prior employer, a well-known database and application company located somewhere near Palo Alto, we had NetApp systems that didn't even have a problem at 98% capacity because the workloads were almost entirely random reads.
-----Original Message----- From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Michael Bergman Sent: Friday, November 06, 2015 12:06 AM To: Toasters Subject: Re: Completely filling and aggregate
Jeffrey Steiner wrote:
The rule I usually use is this: [...]
4) You'll probably start seeing slowdowns as approach 95%.
That threshold depends on the workload -- among a bunch of other things the level of random overwrites -- if you're using dedup or not, etc. Baiscally: what is the workload doing to your free space in the Aggr, and can free_space_realloc [on | no_redirect] hold it nice and clean? If not...
More often than not you'll see slowdown, especially for W (higher latency, and/or spikes) long before you reach 95%. More like 85+ somewhere, I'd say. Sure, I have very heavy nasty NFSv3 workload here, most ppl prob will never see such stuff, but >90% here is a really really bad idea
Need to aim for having it around 80% max. Trying to do a reallocate -A with less than that avail in an Aggr isn't pleasant trust me
/M
_______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
_______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters