Re: metro under high latency

29 Mar 2016


      Michael, my testing shows that using just CPRead Ops / Write Ops generates
a lot of false positives, as well as being less easy to predict when
things go off. After sitting and thinking about _why_ for a good long
while, I realized the underlying issue with just looking at ops, is that
it doesn’t quantify the IO impact to disks for each op type in a
comparable way.  In other words, we are looking at a bucket of IO, and
treating someone dumping 1 bucket into it the same as someone dumping 1
glass of water into it (for the purposes of figuring overhead).
So, if I want to quantify how much overhead is going towards dealing with
a lack of contagious freespace in an aggregate (whose impact is how much
time/IO I’m giving up from my write cycle to do reads to correct parity on
the stripe), we need to calculate how much overhead IO is being consumed
by the CPReads. if the IO overhead is low, it’s not going to affect
performance, regardless of the count of ops for either type.  So, lets
say, for example, you have an aggregate with only 5 writes/s, with a chain
length of 52… but the CP Reads are 20/s, with a chain length of 1 or 2.
If I just look at ops, I’m 20:5 CP Read:Writes, or _4_ on the ‘am I
healthy’ number… if we take chain length into account, it’s 40:260 or .15
on the ‘am I healthy’ number.  while it’s very true that high CP Read ops
vs write ops is a good thing to check when a system is already suffering,
it’s not always reliable metric to use if you are trying to monitoring a
system in order to ensure this overhead doesn’t sneak up on you.
it’s possible to chart and trend both sets of numbers over months and
make correlations with performance impacts with the ‘chain length’ method
that are very predictable. There are way too many exceptions with the
‘ops’ method to reliably make correlations. just on a lark (and it’s
actually in one of the factory CMPG configs now), I also added OPS to the
existing CPREAD*CHAIN/WRITES*CHAIN math as a ‘multiplying’ factor, to
extend the range of the numbers a bit more, and I found that until you are
consistently over .75 on the chain maths, the ratio for CPREAD
OPS/WriteOPs doesn’t go positive enough to move the ratio in a meaningful
way. Once you go past that number, the addition of the positive number to
the ratio that comes from having so much more extra CP magnifies the
original ratio quite nicely, and highlights the point where things begin
to feel bad on the controller, as opposed to just being very busy.
Since I’m all about automating and simplifying those kinds of things so an
expert doesn’t have to identify all the exceptions, I prefer to push the
‘chain length’ method, because it doesn’t require much in the way of extra
caveats.
pF
On 3/29/16, 9:12 AM, "toasters-bounces@teaparty.net on behalf of Michael
Bergman" <toasters-bounces@teaparty.net on behalf of
michael.bergman@ericsson.com> wrote:
...
Some comments from my IRL experiences with this.
Caveat lector: YMMV. It all depends on the exact nature of the workload.
Paul Flores wrote:
...
If you want to figure out if you are suffering from overhead from a lack
of contagious freespace, you can compute it from statit with the
following
formula:
(CP Reads * chain)  / (writes * chain) = CPRead Ratio.  This can be done
on any data disk in a raid group.
My experience is that the chain lenghts don't really matter much to this
procedure / calculation. It's the actual reads ops ws actula user_writes.
The amount of data, the bandwidth, that flows there depends more than
anything else on the nature of the workload.
And, short chain length = more cp_read ops... pretty much
...
this will give you a value expressing the amount of correct parity(CP
Read) workload vs write workload.   below .5 is very efficientŠ .51 -
.75
is usually not noticeable, but may need some attentionŠ  .75 - 1 is bad,
and more than 1 is _very_ badŠ
Here's my ranges:
<1 is really good (certainly of you have many spinldes in your backend
and 
the disk_busy is low overall)
1-1.5 is ok, may be noticed but not much (depends on the wokload)
2 and above is pretty bad. But may still not be a disaster. It depends ;-)
Note that it is pretty darn difficult for a system with a very random
aggregated workload to keep this ratio <1 for any longer than max 5-7
days. 
It depends on the FAS model of course, but for FSR (free_space_reallocat
option on the Aggrs) to keep this nice, it needs cycles and if the FG
workload is heavy enough, it will invariably lag behind so slowly the
ratio 
cp_reads/user_writes will grow larger and larger over time.
The *only* constructive thing one can do about this, is reallocate -A
(aggregate reallocat). It's very impatctful w.r.t. protocol latency and
my 
advice is: don't do it unless you have max 75% vol util of your
Aggregate. 
With any less than that, it'll likely be a waste of pain to run a
reallocate 
-A on that Aggr.
...
The ratio has been steadily climbing since januaryŠ (see default.png
attachment for chart). Is there a new type of workload on the system
that
is either quickly filling the aggr, or has a _lot_ of add and delete
characteristics to it (creating lots of holes in the stripes),
This is pretty much the worst thing you can do to WAFL and it is exactly
the 
type of thing we have here too:
write, delete, write, delete, write, delete [lather, rinse, repeat]
Free Space Fragmentation will invariably result. And the only thing you
can 
do is... [see above]
...
you will want your aggr option for free_space_realloc=on, I think, if
you
want to ensure that free space stays happy for the future.  TR-3929
ŒReallocate Best Practices Guide¹ has all the gory details.
I agree. It is a *very* good idea to have FSR on. On all Aggrs. It *does*
take some CPU cycles of course, and some IOPS too. Plus it may not be
able 
to keep up, if there's too much of the type of workload pattern i
described 
above. Sucks to be you and the only thing you can do about it is... [see
above] :-) Or rather :-\
N.B. Make sure you turn on FSR *just* after you have finished a... yep,
you 
guessed it, reallocate -A.  Unless you did turn it on when the Aggregate
was 
pristine. No..? I'm sorry.
BTW: before you run reallocate -A, turn off FSR on that Aggregate or you
may 
be very surprised about the behaviour of the system or the end result (or
both). And make sure no Dedup jobs run on anything on that Aggr during
the 
reallocate -A, or it will almost certainly wreak havoc and ruin everything
If anyone wants to try this reallocate -A procedure out, you can shoot me
an 
e-mail and I'll give you a procedure (1,2,3,... to follow). One which
I've 
used many many times on very large Aggregates so it's well tested. It's
semi 
manual so a bit of a pain, but better than nothing
Cheers,
M
-- 
Michael Bergman
Sr Systems Analyst / Storage Architect   michael.bergman@ericsson.com
Engineering Hub Stockholm                Phone +46 10 7152945
EMEA N, Operations North, IT Ops Kista   SMS/MMS +46 70 5480835
Ericsson                                 Torshamnsg 33, 16480 Sthlm,
Sweden
--
This communication is confidential. We only send and receive email on the
basis of the terms set out at www.ericsson.com/email_disclaimer

Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: metro under high latency