And here is some iostat output from one of the Solr slaves during the same timeframe:
12/03/2015 06:48:36 PM
avg-cpu: %user %nice %system %iowait %steal %idle
7.54 0.00 7.42 44.12 0.00 40.92
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 4.50 0.00 0.00 0.00 0.00 0.00 5.46 0.00 0.00 62.65
sdb 0.00 26670.00 0.00 190.50 0.00 95.25 1024.00 162.75 214.87 5.25 100.00
dm-0 0.00 0.00 1.00 11.50 0.00 0.04 8.00 5.59 0.00 50.12 62.65
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 3.00 0.00 0.01 8.00 2.44 0.00 135.33 40.60
dm-3 0.00 0.00 0.00 26880.00 0.00 105.00 8.00 20828.90 194.77 0.04 100.00
12/03/2015 06:48:38 PM
avg-cpu: %user %nice %system %iowait %steal %idle
9.23 0.00 16.03 24.23 0.00 50.51
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 177.00 1.00 19.50 0.00 0.79 78.83 7.91 651.90 16.59 34.00
sdb 0.00 73729.00 0.00 599.50 0.00 299.52 1023.23 142.51 389.81 1.67 100.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.56 0.00 0.00 27.55
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 186.50 0.00 0.73 8.00 87.75 483.59 1.82 34.00
dm-3 0.00 0.00 0.00 74310.00 0.00 290.27 8.00 18224.54 402.32 0.01 100.00
12/03/2015 06:48:40 PM
avg-cpu: %user %nice %system %iowait %steal %idle
9.27 0.00 10.04 22.91 0.00 57.79
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 24955.50 0.00 202.00 0.00 101.00 1024.00 142.07 866.56 4.95 100.05
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 0.00 25151.50 0.00 98.25 8.00 18181.29 890.67 0.04 100.05
12/03/2015 06:48:42 PM
avg-cpu: %user %nice %system %iowait %steal %idle
9.09 0.00 12.08 21.95 0.00 56.88
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 2.50 0.00 1.50 0.00 0.01 18.67 0.46 36.33 295.33 44.30
sdb 0.00 59880.50 0.00 461.50 0.00 230.75 1024.00 144.82 173.12 2.17 99.95
dm-0 0.00 0.00 0.00 1.00 0.00 0.00 8.00 0.81 0.00 407.50 40.75
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 3.50 0.00 0.01 8.00 0.13 37.29 10.14 3.55
dm-3 0.00 0.00 0.00 60352.50 0.00 235.75 8.00 18538.70 169.30 0.02 100.00
As you can see, we are getting some decent throughput, but it causes the latency to spike on the filer. I have heard that the avgrq-sz in iostat is related to the block size, can anyone verify that? Is a 1MB block size too much for the filer? I am still researching if there is a way to modify this in Solr, but I haven't come up with much yet. Note, the old Solr slaves were made up of physcal DL360p's with only a local 2-disk 10k RAID1. The new slaves and relay-master are currently all connected with 10Gb, which removed the network 1Gb bottleneck for the replication, which could be uncorking the bottle so-to-speak. I'm still at a loss why this is hurting the filer so much though.
Any ideas?
--
GPG keyID: 0xFECC890C
Phil Gardner