On 10/10/99 12:12:52 you wrote:
"sirbruce" == sirbruce sirbruce@ix.netcom.com writes:
sirbruce> The problem is that with a changing filesystem, such sirbruce> programs could easily report a problem when in fact sirbruce> there is none. There are some ways around this.
Huh? If the filesystem is made immutable, it isn't a changing filesystem. e.g, on a Unix host, this _should_ be safe:
unmount filesystem. mount filesystem read-only. run fsck on filesystem. remount filesystem read-write.
Sure, but for many environments, read-only is not an option. It may be for you, but for others it's still downtime.
Why should I expect downtime? A failed disk is a problem, but it doesn't cause downtime. A failed power-supply is a problem, but it also doesn't cause downtime. A failed head is a problem, but in a cluster, no downtime (well, 60 seconds downtime). NA has designed the filer to stay up in the face of these problems. So if NA has a check list of problems and it is working its way down the check list to keep filers up in the face of these problems, then "file-system health-check and fix" needs to be added to that list. Sure, it isn't a common occurance, but clearly it happens often enough for NA to have written wack and constantly improved it over the years. I'm arguing that the next improvement is to allow wack to be run on an on-line filer.
Upgrading your software/firmware/disk firmware is more common. So is adding new cards into the system. Yet you expect downtime on those. So if what was most common was the measure of importance, then these should be worked on before an on-line wack.
I agree online wack is a good thing to have, but I guess I'm satisfied in seeing Netapp concentrate on other bugs and features first. I wasn't arguing that it shouldn't be done.
Bruce