On 10/10/99 12:12:52 you wrote:
>
> "sirbruce" == sirbruce <sirbruce(a)ix.netcom.com> writes:
>
> sirbruce> The problem is that with a changing filesystem, such
> sirbruce> programs could easily report a problem when in fact
> sirbruce> there is none. There are some ways around this.
>
>Huh? If the filesystem is made immutable, it isn't a changing
>filesystem. e.g, on a Unix host, this _should_ be safe:
>
>unmount filesystem.
>mount filesystem read-only.
>run fsck on filesystem.
>remount filesystem read-write.
Sure, but for many environments, read-only is not an option. It
may be for you, but for others it's still downtime.
>Why should I expect downtime? A failed disk is a problem, but it
>doesn't cause downtime. A failed power-supply is a problem, but it
>also doesn't cause downtime. A failed head is a problem, but in a
>cluster, no downtime (well, 60 seconds downtime). NA has designed the
>filer to stay up in the face of these problems. So if NA has a check
>list of problems and it is working its way down the check list to keep
>filers up in the face of these problems, then "file-system
>health-check and fix" needs to be added to that list. Sure, it isn't a
>common occurance, but clearly it happens often enough for NA to have
>written wack and constantly improved it over the years. I'm arguing
>that the next improvement is to allow wack to be run on an on-line
>filer.
Upgrading your software/firmware/disk firmware is more common. So
is adding new cards into the system. Yet you expect downtime on
those. So if what was most common was the measure of importance,
then these should be worked on before an on-line wack.
I agree online wack is a good thing to have, but I guess I'm
satisfied in seeing Netapp concentrate on other bugs and features
first. I wasn't arguing that it shouldn't be done.
Bruce