I had this issue twice now on two different systems. There will be "sudden" high latency, across the board. Oncommand performance manager will have a some disk layout warnings, and tons of latency warnings with no clear shark. Grafana stats will show high "Wall write cleaning" utilization. Sysstat shows slow flushing CPs with a lot of "s" type... Special files. First time (3250 8.2.3p5 cdot) I had to do a takeover just to break the spell, latency and cpu were so bad. Second time was on a fas8080 8.3.1, so a little more robust. In this case, wafl scan stats showed a long, stalled snap create summary (see bottom, outputs on same color several min apart). This is a Snapmirror source, and aborting the sm processes on the dest fixed the problem and eliminated the snap create wafl scan, even though they were completing regularly (though slowly). The only common denominator here is both of these systems use flashpool. Has anyone seen this before? I'm really baffled that the sm abort fixed the issue on the source. I have a case open with support and so far they are saying "lots of Snapmirror processes" but it's really only about 20... Thanks, Fred <!--#yiv0598718352 .yiv0598718352EmailQuote {margin-left:1pt;padding-left:4pt;border-left:#800000 2px solid;}-->CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk OTHER FCP iSCSI FCP kB/s iSCSI kB/s in out read write read write age hit time ty util in out in out 29% 1309 0 0 2644 747 10568 450806 363535 0 0 0s 92% 100% :s 31% 123 1212 0 43230 172098 0 0 18% 278 0 0 2286 435 55557 238412 105938 0 0 0s 94% 100% :s 10% 838 1170 0 47435 103353 0 0 14% 357 0 0 1278 527 13516 110072 20700 0 0 1 94% 100% :s 9% 214 707 0 20912 59405 0 0 17% 506 0 0 3171 611 55294 220118 43358 0 0 0s 93% 100% :s 10% 1052 1613 0 33894 144762 0 0 17% 475 0 0 2025 415 33509 230281 43878 0 0 1 93% 100% :f 14% 587 963 0 20764 109122 0 0 16% 525 0 0 2167 289 4638 78980 62 0 0 1 92% 100% :v 7% 326 1316 0 33645 67102 0 0 24% 766 0 0 2069 681 4712 289509 157377 0 0 0s 94% 93% Zn 16% 289 1014 0 30442 67344 0 0 18% 232 0 0 1518 387 30691 217187 205447 0 0 0s 94% 100% :s 18% 615 671 0 4686 92926 0 0 18% 302 0 0 1830 593 29273 211516 112819 0 0 0s 94% 100% :s 11% 577 951 0 13343 94976 0 0 16% 404 0 0 1381 248 1508 74363 51789 0 0 1s 95% 100% :s 6% 144 833 0 18491 66701 0 0 15% 250 0 0 1437 362 25192 127166 14783 0 0 0s 95% 100% :s 13% 469 718 0 5478 64261 0 0 15% 272 0 0 1315 289 26669 75965 7 0 0 0s 95% 100% :s 5% 488 555 0 9930 56409 0 0 15% 362 0 0 1352 384 9242 126759 56523 0 0 1 92% 100% :s 5% 224 766 0 4169 61645 0 0 17% 393 0 0 1667 527 41316 162630 0 0 0 0s 93% 100% :s 14% 683 591 0 2742 91873 0 0 13% 489 0 0 1847 382 22543 126444 0 0 0 0s 91% 100% :s 10% 416 942 0 13912 98668 0 0 12% 466 0 0 1576 267 1942 125502 22 0 0 >60 92% 100% :s 8% 75 1035 0 25502 105575 0 0 19% 648 0 0 2236 616 12843 180895 7 0 0 1 91% 100% :f 13% 259 1329 0 20573 112682 0 0 17% 589 0 0 2648 732 40612 165832 43495 0 0 1 93% 100% :v 16% 662 1397 0 30105 84571 0 0 13% 298 0 0 1494 224 5258 75630 47 0 0 0s 91% 100% :v 19% 93 1103 0 37230 59553 0 0 25% 961 0 0 2517 611 7662 293316 228741 0 0 0s 95% 93% Zs 24% 70 1486 0 28444 64059 0 0 CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk OTHER FCP iSCSI FCP kB/s iSCSI kB/s in out read write read write age hit time ty util in out in out 18% 713 0 0 1802 308 4102 139025 138664 0 0 1s 93% 100% :s 17% 106 983 0 12370 67131 0 0 18% 653 0 0 2290 356 3024 175784 114858 0 0 0s 93% 100% :s 17% 103 1534 0 30419 103879 0 0 14% 277 0 0 1117 264 835 96862 48373 0 0 0s 93% 100% :s 16% 168 672 0 12616 68979 0 0 13% 465 0 0 1236 616 1505 56551 19 0 0 0s 94% 100% :s 13% 150 621 0 10984 51964 0 0 13% 363 0 0 1184 359 724 47141 6 0 0 1 93% 100% :s 14% 175 646 0 15597 41360 0 0 16% 394 0 0 1192 330 1622 150649 62399 0 0 0s 93% 100% :s 21% 149 649 0 13094 89017 0 0 16% 1660 0 0 2609 1180 3836 101862 22 0 0 1 88% 100% :s 15% 203 746 0 12464 58093 0 0 15% 2306 0 0 3503 1574 23707 144890 0 0 0 0s 93% 100% :s 17% 442 755 0 11702 95250 0 0 14% 1118 0 0 2256 570 30514 97837 33885 0 0 1 95% 100% :f 10% 612 526 0 9866 54222 0 0 14% 3636 0 0 4489 1649 7360 70514 66 0 0 1 92% 100% :v 7% 238 615 0 10162 56678 0 0 13% 3540 0 0 4277 1615 2594 63757 0 0 0 1s 91% 100% :v 6% 129 608 0 14768 48771 0 0 16% 2842 0 0 4097 1682 20595 137085 22 0 0 1 93% 100% :v 14% 416 839 0 14408 78167 0 0 15% 2544 0 0 3760 1540 37961 125949 0 0 0 0s 94% 100% :v 10% 692 524 0 10924 100674 0 0 26% 2493 0 0 4307 991 9312 275089 153829 0 0 0s 94% 90% Zs 22% 329 1485 0 44023 126413 0 0 16% 1624 0 0 2577 1154 3959 179468 99585 0 0 0s 93% 100% :s 20% 145 808 0 20926 89793 0 0 19% 2543 0 0 4807 2015 59604 258735 76443 0 0 0s 94% 100% :s 17% 1042 1222 0 30911 147132 0 0 18% 3354 0 0 4264 1452 1771 189401 92019 0 0 0s 94% 100% :s 7% 93 817 0 14723 84496 0 0 13% 3561 0 0 4435 1553 1027 99965 47137 0 0 0s 93% 100% :s 6% 101 773 0 10691 82650 0 0 18% 2808 0 0 3998 1687 6156 174923 125 0 0 1 92% 100% :s 14% 216 974 0 19335 117237 0 0 18% 4372 0 0 6370 2245 55867 242723 67036 0 0 0s 94% 100% :s 10% 1019 979 0 19915 133940 0 0 Volume $$$$$$$$$$$$$$$$$$$$$$$$$$: Scan id Type of scan progress 6 active bitmap rearrangement fbn 19501 of 21126 w/ max_chain_len 311362705 snap create summary update block 3 of 2112711363427 block ownership calculation block 6400 of 2112611363428 blocks used summary update block 889 of 21127 Volume $$$$$$$$$$$$$$$$$$$$$$$$$$: Scan id Type of scan progress 6 active bitmap rearrangement fbn 19501 of 21126 w/ max_chain_len 311362705 snap create summary update block 5 of 2112711363427 block ownership calculation block 6500 of 2112611363428 blocks used summary update block 2096 of 21127 Volume $$$$$$$$$$$$$$$$$$$$$$$$$$: Scan id Type of scan progress 6 active bitmap rearrangement fbn 19502 of 21126 w/ max_chain_len 311362705 snap create summary update block 4 of 21127 (fbn 211)11363427 block ownership calculation block 7100 of 2112611363428 blocks used summary update block 2799 of 21127 (fbn 3952) This e-mail and its attachments are intended only for the individual or entity to whom it is addressed and may contain information that is confidential, privileged, inside information,or subject to other restrictions on use or disclosure. Any unauthorized use, dissemination or copying of this transmission or the information in it is prohibited and may be unlawful. If you have received this transmission in error, please notify the senderimmediately by return e-mail, and permanently delete or destroy this e-mail, any attachments, and all copies (digital or paper). Unless expressly stated in this e-mail, nothing in this message should be construed as a digital or electronic signature. For additionalimportant disclaimers and disclosures regarding KCG’s products and services, please click on the following link:
Update on this -- this is actually a bug specific to thin-provisioned volumes and flashpool. There is an issue deleting snapshots. In this case, it was deleting the oldest on a schedule, and wafl scan was showing snap creation (btw, is there a separate wafl scan line or other evidence of active snap deletion?)
http://mysupport.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=1007236
From: Fred Grieco fredgrieco@yahoo.com To: Toasters toasters@teaparty.net Sent: Wednesday, April 5, 2017 8:23 PM Subject: Wafl clean stalling
I had this issue twice now on two different systems. There will be "sudden" high latency, across the board. Oncommand performance manager will have a some disk layout warnings, and tons of latency warnings with no clear shark. Grafana stats will show high "Wall write cleaning" utilization. Sysstat shows slow flushing CPs with a lot of "s" type... Special files. First time (3250 8.2.3p5 cdot) I had to do a takeover just to break the spell, latency and cpu were so bad. Second time was on a fas8080 8.3.1, so a little more robust. In this case, wafl scan stats showed a long, stalled snap create summary (see bottom, outputs on same color several min apart). This is a Snapmirror source, and aborting the sm processes on the dest fixed the problem and eliminated the snap create wafl scan, even though they were completing regularly (though slowly). The only common denominator here is both of these systems use flashpool. Has anyone seen this before? I'm really baffled that the sm abort fixed the issue on the source. I have a case open with support and so far they are saying "lots of Snapmirror processes" but it's really only about 20... Thanks, Fred #yiv4848860322 #yiv4848860322 -- .yiv4848860322EmailQuote {margin-left:1pt;padding-left:4pt;border-left:#800000 2px solid;}#yiv4848860322 CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk OTHER FCP iSCSI FCP kB/s iSCSI kB/s in out read write read write age hit time ty util in out in out 29% 1309 0 0 2644 747 10568 450806 363535 0 0 0s 92% 100% :s 31% 123 1212 0 43230 172098 0 0 18% 278 0 0 2286 435 55557 238412 105938 0 0 0s 94% 100% :s 10% 838 1170 0 47435 103353 0 0 14% 357 0 0 1278 527 13516 110072 20700 0 0 1 94% 100% :s 9% 214 707 0 20912 59405 0 0 17% 506 0 0 3171 611 55294 220118 43358 0 0 0s 93% 100% :s 10% 1052 1613 0 33894 144762 0 0 17% 475 0 0 2025 415 33509 230281 43878 0 0 1 93% 100% :f 14% 587 963 0 20764 109122 0 0 16% 525 0 0 2167 289 4638 78980 62 0 0 1 92% 100% :v 7% 326 1316 0 33645 67102 0 0 24% 766 0 0 2069 681 4712 289509 157377 0 0 0s 94% 93% Zn 16% 289 1014 0 30442 67344 0 0 18% 232 0 0 1518 387 30691 217187 205447 0 0 0s 94% 100% :s 18% 615 671 0 4686 92926 0 0 18% 302 0 0 1830 593 29273 211516 112819 0 0 0s 94% 100% :s 11% 577 951 0 13343 94976 0 0 16% 404 0 0 1381 248 1508 74363 51789 0 0 1s 95% 100% :s 6% 144 833 0 18491 66701 0 0 15% 250 0 0 1437 362 25192 127166 14783 0 0 0s 95% 100% :s 13% 469 718 0 5478 64261 0 0 15% 272 0 0 1315 289 26669 75965 7 0 0 0s 95% 100% :s 5% 488 555 0 9930 56409 0 0 15% 362 0 0 1352 384 9242 126759 56523 0 0 1 92% 100% :s 5% 224 766 0 4169 61645 0 0 17% 393 0 0 1667 527 41316 162630 0 0 0 0s 93% 100% :s 14% 683 591 0 2742 91873 0 0 13% 489 0 0 1847 382 22543 126444 0 0 0 0s 91% 100% :s 10% 416 942 0 13912 98668 0 0 12% 466 0 0 1576 267 1942 125502 22 0 0 >60 92% 100% :s 8% 75 1035 0 25502 105575 0 0 19% 648 0 0 2236 616 12843 180895 7 0 0 1 91% 100% :f 13% 259 1329 0 20573 112682 0 0 17% 589 0 0 2648 732 40612 165832 43495 0 0 1 93% 100% :v 16% 662 1397 0 30105 84571 0 0 13% 298 0 0 1494 224 5258 75630 47 0 0 0s 91% 100% :v 19% 93 1103 0 37230 59553 0 0 25% 961 0 0 2517 611 7662 293316 228741 0 0 0s 95% 93% Zs 24% 70 1486 0 28444 64059 0 0 CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk OTHER FCP iSCSI FCP kB/s iSCSI kB/s in out read write read write age hit time ty util in out in out 18% 713 0 0 1802 308 4102 139025 138664 0 0 1s 93% 100% :s 17% 106 983 0 12370 67131 0 0 18% 653 0 0 2290 356 3024 175784 114858 0 0 0s 93% 100% :s 17% 103 1534 0 30419 103879 0 0 14% 277 0 0 1117 264 835 96862 48373 0 0 0s 93% 100% :s 16% 168 672 0 12616 68979 0 0 13% 465 0 0 1236 616 1505 56551 19 0 0 0s 94% 100% :s 13% 150 621 0 10984 51964 0 0 13% 363 0 0 1184 359 724 47141 6 0 0 1 93% 100% :s 14% 175 646 0 15597 41360 0 0 16% 394 0 0 1192 330 1622 150649 62399 0 0 0s 93% 100% :s 21% 149 649 0 13094 89017 0 0 16% 1660 0 0 2609 1180 3836 101862 22 0 0 1 88% 100% :s 15% 203 746 0 12464 58093 0 0 15% 2306 0 0 3503 1574 23707 144890 0 0 0 0s 93% 100% :s 17% 442 755 0 11702 95250 0 0 14% 1118 0 0 2256 570 30514 97837 33885 0 0 1 95% 100% :f 10% 612 526 0 9866 54222 0 0 14% 3636 0 0 4489 1649 7360 70514 66 0 0 1 92% 100% :v 7% 238 615 0 10162 56678 0 0 13% 3540 0 0 4277 1615 2594 63757 0 0 0 1s 91% 100% :v 6% 129 608 0 14768 48771 0 0 16% 2842 0 0 4097 1682 20595 137085 22 0 0 1 93% 100% :v 14% 416 839 0 14408 78167 0 0 15% 2544 0 0 3760 1540 37961 125949 0 0 0 0s 94% 100% :v 10% 692 524 0 10924 100674 0 0 26% 2493 0 0 4307 991 9312 275089 153829 0 0 0s 94% 90% Zs 22% 329 1485 0 44023 126413 0 0 16% 1624 0 0 2577 1154 3959 179468 99585 0 0 0s 93% 100% :s 20% 145 808 0 20926 89793 0 0 19% 2543 0 0 4807 2015 59604 258735 76443 0 0 0s 94% 100% :s 17% 1042 1222 0 30911 147132 0 0 18% 3354 0 0 4264 1452 1771 189401 92019 0 0 0s 94% 100% :s 7% 93 817 0 14723 84496 0 0 13% 3561 0 0 4435 1553 1027 99965 47137 0 0 0s 93% 100% :s 6% 101 773 0 10691 82650 0 0 18% 2808 0 0 3998 1687 6156 174923 125 0 0 1 92% 100% :s 14% 216 974 0 19335 117237 0 0 18% 4372 0 0 6370 2245 55867 242723 67036 0 0 0s 94% 100% :s 10% 1019 979 0 19915 133940 0 0 Volume $$$$$$$$$$$$$$$$$$$$$$$$$$: Scan id Type of scan progress 6 active bitmap rearrangement fbn 19501 of 21126 w/ max_chain_len 311362705 snap create summary update block 3 of 2112711363427 block ownership calculation block 6400 of 2112611363428 blocks used summary update block 889 of 21127 Volume $$$$$$$$$$$$$$$$$$$$$$$$$$: Scan id Type of scan progress 6 active bitmap rearrangement fbn 19501 of 21126 w/ max_chain_len 311362705 snap create summary update block 5 of 2112711363427 block ownership calculation block 6500 of 2112611363428 blocks used summary update block 2096 of 21127 Volume $$$$$$$$$$$$$$$$$$$$$$$$$$: Scan id Type of scan progress 6 active bitmap rearrangement fbn 19502 of 21126 w/ max_chain_len 311362705 snap create summary update block 4 of 21127 (fbn 211)11363427 block ownership calculation block 7100 of 2112611363428 blocks used summary update block 2799 of 21127 (fbn 3952) This e-mail and its attachments are intended only for the individual or entity to whom it is addressed and may contain information that is confidential, privileged, inside information,or subject to other restrictions on use or disclosure. Any unauthorized use, dissemination or copying of this transmission or the information in it is prohibited and may be unlawful. If you have received this transmission in error, please notify the senderimmediately by return e-mail, and permanently delete or destroy this e-mail, any attachments, and all copies (digital or paper). Unless expressly stated in this e-mail, nothing in this message should be construed as a digital or electronic signature. For additionalimportant disclaimers and disclosures regarding KCG’s products and services, please click on the following link: