"sirbruce" == sirbruce sirbruce@ix.netcom.com writes:
sirbruce> The problem is that with a changing filesystem, such sirbruce> programs could easily report a problem when in fact sirbruce> there is none. There are some ways around this.
Huh? If the filesystem is made immutable, it isn't a changing filesystem. e.g, on a Unix host, this _should_ be safe:
unmount filesystem. mount filesystem read-only. run fsck on filesystem. remount filesystem read-write.
sirbruce> Personally, while I think this should be on Netapp's sirbruce> agenda, there are more important things as well. Wack sirbruce> has been improved and now runs much faster than before.
Agreed. wackz from 5.2.3D1 (what NA had me run) completed on the filer in question in ~ 15 minutes. This is an F740. The filesystem checked is 105GB, composed of three raid-groups (5+1, 5+1, 4+1), 1067656 inodes used of 3651436.
BTW - no errors were found by wackz, in spite of the 212 parity errors corrected a week earlier.
sirbruce> You should expect some downtime to happen when problems sirbruce> occur; having parity inconsistencies is *not* a normal sirbruce> occurrance and should not happen often.
Why should I expect downtime? A failed disk is a problem, but it doesn't cause downtime. A failed power-supply is a problem, but it also doesn't cause downtime. A failed head is a problem, but in a cluster, no downtime (well, 60 seconds downtime). NA has designed the filer to stay up in the face of these problems. So if NA has a check list of problems and it is working its way down the check list to keep filers up in the face of these problems, then "file-system health-check and fix" needs to be added to that list. Sure, it isn't a common occurance, but clearly it happens often enough for NA to have written wack and constantly improved it over the years. I'm arguing that the next improvement is to allow wack to be run on an on-line filer.
j.