Re: metro under high latency

28 Mar 2016

      Eep no, that 5% aggr snapshot space is really important in metros.
On 26 March 2016 at 14:48, Klise, Steve Steve.Klise@wwt.com wrote:
...
Lame question, but if the aggr is at standard 5% snapshot reserve, maybe
freeing up that space may help (setting to 0)..
Thanks
Steve Klise
From: toasters-bounces@teaparty.net on behalf of Jeffrey Mohler <
jmohler@yahoo-inc.com>
Reply-To: Jeffrey Mohler jmohler@yahoo-inc.com
Date: Friday, March 25, 2016 at 7:23 PM
To: "Flores, Paul" Paul.Flores@netapp.com, josef radinger <
cheese@nosuchhost.net>
Cc: "toasters@teaparty.net" toasters@teaparty.net
Subject: Re: metro under high latency
Ya...I wouldn't get near reallocation that you really =dont= need, given
the CPU and disk use, and secondary effects that will add to your system.
Free space frag is only an issue, if it's an issue...and disk that
'not-busy' is soaking up the additional IO from it cleanly, and without any
of the other secondary effects of fixing a problem that ain't there.
Work on the actual problem.  Alignment...if it does exist as an issue, it
chews up CPU, chews up disk IO, and makes the CP process slow and latent to
client ACKs.

Jeff Mohler jmohler@yahoo-inc.com
Tech Yahoo, Storage Architect, Principal
(831)454-6712
YPAC Gold Member
Twitter: @PrincipalYahoo
CorpIM:  Hipchat & Iris
On Friday, March 25, 2016 6:49 PM, "Flores, Paul" Paul.Flores@netapp.com
wrote:
Josef,
If you want to figure out if you are suffering from overhead from a lack
of contagious freespace, you can compute it from statit with the following
formula:
(CP Reads * chain)  / (writes * chain) = CPRead Ratio.  This can be done
on any data disk in a raid group.
this will give you a value expressing the amount of correct parity(CP
Read) workload vs write workload.  below .5 is very efficientŠ .51 - .75
is usually not noticeable, but may need some attentionŠ  .75 - 1 is bad,
and more than 1 is _very_ badŠ  Generally, the closer the ratio gets to
1:1 the less efficient the controller is, because it¹s having to do so
much CP Read IO for a given write stripe.
The ratio has been steadily climbing since januaryŠ (see default.png
attachment for chart). Is there a new type of workload on the system that
is either quickly filling the aggr, or has a _lot_ of add and delete
characteristics to it (creating lots of holes in the stripes), but even
so, I¹d be surprised to see it affecting write latency on it¹s own this
early in the game. it¹s usually got to be a bit higher before anyone
notices, but there is no denying, there does seem to be some correlation
with the latency increases on your VM Volumes as the ratio gets worseŠ
you will want your aggr option for free_space_realloc=on, I think, if you
want to ensure that free space stays happy for the future.  TR-3929
ŒReallocate Best Practices Guide¹ has all the gory details.
you may also want to take a peak at nfs-stat -d on your controllerŠ there
are some indications of VMDK files that may not be aligned properly, and
checking the last couple of months, there were a few spots where the
controller ran out of partial write handling resourcesŠ probably not the
overall issue, but Œone more thing¹ to be concerned with.
Paul Flores
Professional Services Consultant 3
Americas Performance Assessments Team
NetApp
281-857-6981 Direct Phone
713-446-5219 Mobile Phone
paul.flores@netapp.com
http://www.netapp.com/us/solutions/professional/assessment.html
On 3/25/16, 3:29 PM, "toasters-bounces@teaparty.net on behalf of josef
radinger" <toasters-bounces@teaparty.net on behalf of
cheese@nosuchhost.net> wrote:
...
cpu on netapp is higher than it used to be. i think we see higher cpu
than normal since around 1 month. we used to be at 30-40% and now we
are slightly higher during work-hours at 50-60% with some peaks to
around 90%. i'm currently not in my office and have no access to exact
statistics.
On Fri, 2016-03-25 at 19:41 +0000, Jeffrey Mohler wrote:
...
They should be 100% on an empty aggregate..but still, spindles seem
to handle the workload just fine.
What the history of CPU on the system..when did it work well last,
what was CPU load then?

Jeff Mohler
Tech Yahoo, Storage Architect, Principal
(831)454-6712
YPAC Gold Member
Twitter: @PrincipalYahoo
CorpIM:  Hipchat & Iris
On Friday, March 25, 2016 12:28 PM, josef radinger <cheese@nosuchhost
.net> wrote:
performance advisor shows the following:
read latency on a vmware-datastore at around 5-20ms.
write latency at around 10-15ms with peaks:
"other" latency at up to 500ms, i'm quite sure this other is my
problem.
but what me bothers is the stripe ratio:
   2428.65 partial stripes                  85.83 full stripes
my knowledge is that i should have a lot more full stripes than
partial ones.
images are at
http://www.nosuchhost.net/~cheese/temp/readandwrite.png
http://www.nosuchhost.net/~cheese/temp/other.png
my colleagues had troubles while patching several windows-systems
residing in that datastore, as the systems got unresponsive and
access got very slow.
On Fri, 2016-03-25 at 18:42 +0000, Jeffrey Mohler wrote:
...
Are you write latency troubled, or read latency troubled?
I don't see free space frag as a huge issue, as lightly loaded as
the spindles report to be.

Jeff Mohler
Tech Yahoo, Storage Architect, Principal
(831)454-6712
YPAC Gold Member
Twitter: @PrincipalYahoo
CorpIM:  Hipchat & Iris
On Friday, March 25, 2016 11:37 AM, josef radinger <cheese@nosuchho
st.net> wrote:
hi
i have a metro cluster (rather old) which is currently responding
very
slowly. 7-mode 8.1.4
there is only one aggregate per head, filled at around 72% one one
head
and 79% on the other head. attached is a statit and sysstat -x 1
from one head.
i see lots of partial stripes and only several full stripes. i
assume
this should mean not enough free space, which should imho not be a
problem at my aggregates.
what is the correct procedure for performing a free space
reallocation?
i did:

stop all volume-reallocates
disable read_reallocation on all volumes
raid.lost_write.enable      off
aggr options aggr0 resyncsnaptime 5
reallocate start -A -o aggr0
wait for finish of reallocate
aggr options aggr0 resyncsnaptime 60
enable read_reallocation on all volumes
reenable all volume-level reallocates

my aggregates have options:
aggr options aggr0
root, diskroot, nosnap=off, raidtype=raid_dp, raidsize=20,
ignore_inconsistent=off, snapmirrored=off, resyncsnaptime=60,
fs_size_fixed=off, snapshot_autodelete=on, lost_write_protect=on,
ha_policy=cfo, hybrid_enabled=off, percent_snapshot_space=5%,
free_space_realloc=off
any advice?
josef

Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters
...
...
...

Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters

Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: metro under high latency