We recently upgraded our clustered F740 pair to 5.2.3 from 5.2.2. This necessitated an upgrade of the firmware from 2.1_a2 to 2.2_a2, as well as a disk firmware upgrade from FB37 to FB59.
On one of our filers, the first disk scrub that ran after the upgrade (6 days after the upgrade) found a bunch on parity inconsistencies:
Sun Oct 3 02:50:39 EDT [viking: consumer]: Scrub found 212 parity inconsistencies Sun Oct 3 02:50:39 EDT [viking: consumer]: Scrub found 0 media errors Sun Oct 3 02:50:39 EDT [viking: consumer]: Disk scrubbing finished...
Apparently, with 5.2.3 the filer now generates autosupport mail on disk scrubs, so a couple days after this scrub, I received email from NetApp that they had noticed the error. (This is two anomalous autosupport emails in a row that NetApp has opened cases on, so I'd like to acknowledge NA on that.)
NA's recommendation was to re-run a disk scrub (which I did, and it completed w/o finding any errors) and then to run wackz (which I haven't done).
Unforunately, I cannot find _any_ documentation on wackz on NOW. I've also searched the toasters archive for wackz and didn't find much. In particular, I'd like to know how much downtime I'm going to be saddled with. NA claims that wackz can process 25 million inodes/hour. Given:
viking> df -i Filesystem iused ifree %iused Mounted on /vol/cim0a/ 1057771 2593665 29% /vol/cim0a/
that means the wackz should complete in < 10 minutes. From what I've heard about wack, I find this hard to believe (I've heard stories of wack running for 8+ hours). Or I'm missing something?
I'm interested in hearing from anyone who has run wackz on a similar configuration and how long it took to complete. This filer has 21 9GB disks. The cim0a volume is composed of 3 raid groups (5+1, 5+1, 4+1). The vol0 volume is composed of a single raid group (1+1).
BTW - I'm still not clear on the difference between wack and wackz. The previous explanation to toasters (wackz runs faster, does things in an optimized fashion) wasn't very enlightening. wack must do something above and beyond wackz (or the other way around), otherwise why would both still be included in DOT?
RFE: It would be nice if the filer could run a wack (read-only) and report inconsistencies so that I could check the file system w/o downtime. If the filer requires an immutable filesystem to run wack, then DOT should allow you to tag a whole volume as read-only.
I'd like to thank Ron Thibault of NA for his assistance so far.
j. -- Jay Soffian jay@cimedia.com UNIX Systems Engineer 404.572.1941 Cox Interactive Media