The stats I'm looking at are from a "lun stat" cmd. For example, here are some of the worse lunstats lines for a 1hr interval last Friday.
It shows average latency of 50ms.
Read Write Other QFull Read Write Average Queue Partner Lun Ops Ops Ops kB kB Latency Length Ops kB
10 61 0 0 180 1051 50.55 4.03 0 0 /vol/v_fnce20p_db/q_fnce20p_db/lun2 201401171200 11 61 0 0 171 1070 50.16 4.03 0 0 /vol/v_fnce20p_db/q_fnce20p_db/lun3 201401171200 10 60 0 0 168 1046 50.38 4.02 0 0 /vol/v_fnce20p_db/q_fnce20p_db/lun4 201401171200 10 61 0 0 168 1063 49.80 4.02 0 0 /vol/v_fnce20p_db/q_fnce20p_db/lun1 201401171200 11 60 0 0 171 1051 50.16 4.02 0 0 /vol/v_fnce20p_db/q_fnce20p_db/lun0 201401171200
It looks like they are mostly write ops, but the latency isn't broken down by read/writes. The queue length is 4.
The only snapshot activity is snapmirror/snapvault operations, which occur every 6 hours. And, the snapmirror/snapvault Would have occurred for this volume during this 1hr interval.
I have the sysstat from the same 1hr interval (5 min intervals for 1hr starting 201401171200). The head really isn't very busy. This head is SAN only, so the net activity is snapmirror/snapvault activity.
CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk OTHER FCP iSCSI FCP kB/s iSCSI kB/s in out read write read write age hit time ty util in out in out 40% 0 0 0 5050 320 12672 34895 23444 0 0 14s 96% 39% 54 39% 9 5041 0 15762 88420 0 0 28% 0 0 0 757 530 21550 42006 15396 0 0 26s 93% 33% 42 38% 8 749 0 9284 16318 0 0 25% 0 0 0 838 209 8578 22343 27128 0 0 3s 94% 50% 88 31% 10 828 0 15800 13103 0 0 46% 0 0 0 700 1871 75567 83102 39238 0 0 1s 91% 43% 47 43% 7 693 0 27099 11266 0 0 14% 0 0 0 482 147 8675 21454 9020 0 0 1 96% 37% 19 31% 9 473 0 6240 13769 0 0 17% 0 0 0 702 143 8047 28066 12103 0 0 1 95% 36% 22 41% 9 693 0 8235 21790 0 0 19% 0 0 0 702 137 7922 26423 24004 0 0 1 95% 38% 35 39% 8 694 0 16770 17597 0 0 18% 0 0 0 642 156 8456 26082 14530 0 0 0s 93% 44% 20 44% 9 633 0 8983 15153 0 0 20% 0 0 0 1005 157 8168 33210 14510 0 0 1 93% 27% 30 53% 10 995 0 9724 23917 0 0 20% 0 0 0 1033 137 7337 25215 20989 0 0 1 92% 33% 35 45% 8 1025 0 14157 17978 0 0 19% 0 0 0 860 136 7757 28052 14805 0 0 1 94% 27% 30f 48% 8 852 0 9464 21196 0 0 19% 0 0 0 860 136 7757 28052 14805 0 0 1 94% 27% 30f 48% 8 852 0 9464 21196 0 0 24% 0 0 0 1101 150 9264 33514 18724 0 0 1 94% 32% 30 49% 9 1092 0 12129 25414 0 0
-----Original Message----- From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Michael Bergman Sent: Monday, January 20, 2014 9:47 AM To: Toasters Subject: Re: reallocate effect on snapmirror and snapvault
rrhodes@firstenergycorp.com wrote:
I'm thinking of running a "reallocate start -p" on a couple volumes that are showing high latency disk accesses for luns. The volume is a source for both snapmirror and snapvault operations.
Right, you do have active snapshots (since you're mentioning SnapVault below) I take it. reallocate start -f -p <vol_name> Are you seeing this high latency for READ I/Os above all on these LUNs / Vols? If it's for WRITE, your issue might be a different one and your efforts running that command won't do as much good. Could still be layout related, Free Space Fragmentation e.g.
(I'm not at all sure what you mean by "disk accesses" in this context.)
I was reading up on this command and I can't find any real info on the effect of running this on the snapmirror and snapvault operations.
Question: If I run a "reallocate start -p" on a volume, will snapmirror/snapvault see this and send all reallocated blocks? Also, can snapmirror/snapvault operations be running during the reallocate?
I'm not 100% certain (yet), but incidentally I'm digging around in this area as well since a while now, effects on SnapVault and the snapshots it uses on the source. And as you say, the info is rather frugal.
To my current knowledge, the answer to your first Q is: no. However there will, during a certain time, be extra directs inside WAFL when blocks from snapshots are read and that might slow things down. No idea how much :-( This is automatically corrected by the reallocate -p, which will start wafl reallocate scan in the background that cleans it up for the FlexVol in question. It looks like this (example):
$ xyz 'priv set -q diag; wafl scan status' | egrep redirect | sed 's/ //' 138932 redirect public phase 1 level 2 inode 5970287 of 33554409 (2%) 138942 redirect private phase 3 level 1 block 470560972 of 771868448 (22%) 139000 redirect public phase 1 level 1 inode 19865752 of 33554409 (1%) 139884 redirect public phase 1 level 1 inode 24542520 of 33554409 (3%) 139054 redirect public phase 3 level 1 block 349198358 of 410692272 (33%) 140078 redirect public phase 1 level 2 inode 32509281 of 33554409 (7%) 140519 redirect public phase 1 level 1 inode 11041527 of 113246091 (0%) 140427 redirect public phase 1 level 1 inode 2218329 of 33554409 (0%) 139956 redirect public phase 1 level 1 inode 11865625 of 33554409 (1%) 140125 redirect public phase 1 level 1 inode 17613080 of 33554409 (1%) 138844 redirect private phase 3 level 1 block 305546101 of 537103936 (21%) 138991 redirect public phase 1 level 2 inode 19646076 of 33554409 (2%) 140192 redirect public phase 1 level 1 inode 24762423 of 33554409 (3%) 139853 redirect public phase 3 level 1 block 61740937 of 258940032 (14%) 140521 redirect public phase 2 level 1 block 2279190 of 7518795 (10%)
During this scan, the FlexVol continues to operate normally in any other respect so to speak. You can even abort such a scan cleanly and have it implicitly restarted later, with no harm to anything in the FlexVol.
To the second Q: yes
/M