Autosupport, 5.2.3 upgrade, diskscrub and wackz - toasters

7 Oct 1999


      We recently upgraded our clustered F740 pair to 5.2.3 from 5.2.2. This
necessitated an upgrade of the firmware from 2.1_a2 to 2.2_a2, as well
as a disk firmware upgrade from FB37 to FB59.
On one of our filers, the first disk scrub that ran after the upgrade
(6 days after the upgrade) found a bunch on parity inconsistencies:
Sun Oct  3 02:50:39 EDT [viking: consumer]: Scrub found 212 parity inconsistencies
Sun Oct  3 02:50:39 EDT [viking: consumer]: Scrub found 0 media errors
Sun Oct  3 02:50:39 EDT [viking: consumer]: Disk scrubbing finished...
Apparently, with 5.2.3 the filer now generates autosupport mail on
disk scrubs, so a couple days after this scrub, I received email from
NetApp that they had noticed the error. (This is two anomalous
autosupport emails in a row that NetApp has opened cases on, so I'd
like to acknowledge NA on that.)
NA's recommendation was to re-run a disk scrub (which I did, and it
completed w/o finding any errors) and then to run wackz (which I
haven't done).
Unforunately, I cannot find _any_ documentation on wackz on NOW. I've
also searched the toasters archive for wackz and didn't find much. In
particular, I'd like to know how much downtime I'm going to be saddled
with. NA claims that wackz can process 25 million inodes/hour. Given:
viking> df -i
Filesystem               iused      ifree  %iused  Mounted on
/vol/cim0a/            1057771    2593665    29%   /vol/cim0a/
that means the wackz should complete in < 10 minutes. From what I've
heard about wack, I find this hard to believe (I've heard stories of
wack running for 8+ hours). Or I'm missing something?
I'm interested in hearing from anyone who has run wackz on a similar
configuration and how long it took to complete. This filer has 21 9GB
disks. The cim0a volume is composed of 3 raid groups (5+1, 5+1,
4+1). The vol0 volume is composed of a single raid group (1+1).
BTW - I'm still not clear on the difference between wack and
wackz. The previous explanation to toasters (wackz runs faster, does
things in an optimized fashion) wasn't very enlightening. wack must do
something above and beyond wackz (or the other way around), otherwise
why would both still be included in DOT?
RFE: It would be nice if the filer could run a wack (read-only) and
report inconsistencies so that I could check the file system w/o
downtime. If the filer requires an immutable filesystem to run wack,
then DOT should allow you to tag a whole volume as read-only.
I'd like to thank Ron Thibault of NA for his assistance so far.
j.
--
Jay Soffian jay@cimedia.com                            UNIX Systems Engineer
404.572.1941                                             Cox Interactive Media