On 10/11/99 07:41:30 you wrote:
>The volume could be offline, that's for sure, but why the machine ?
This is the result of a less-than-thorough multivolume implementation
plan. Originally, there only *was* one volume, so taking the whole
filer down wasn't an issue. Presumably when mutlivolume was to be
implemented, Netapp decided it was more important to get that out
the door than to spend more time and resources modifying a better
wack interface to run on an offline volume while the …
[View More]system remained
up.
Note that this is also different from a request for a completely
online wack on an active filesystem, or allowing a filesytem to
remain online read-only while being wacked. I still question the
utility of the latter, but it might be good as part of an automated
process strategy (just as snapshots are created for dump). Anyway,
these are all very different enhancements. Netapp will have to
decide which ones to implement, and which ones to implement first.
Bruce
[View Less]
I wonder how is the backup performed by the new client for NetApp...
1. Is the backup done from a special snapshot or from the "live" files?
2. When defining the NetApp client what directive should be used?
I might have missed this information from the docs I have seen.
Thanks in advance,
Itzik
On 10/10/99 12:12:52 you wrote:
>
> "sirbruce" == sirbruce <sirbruce(a)ix.netcom.com> writes:
>
> sirbruce> The problem is that with a changing filesystem, such
> sirbruce> programs could easily report a problem when in fact
> sirbruce> there is none. There are some ways around this.
>
>Huh? If the filesystem is made immutable, it isn't a changing
>filesystem. e.g, on a Unix host, this _should_ be safe:
>
>unmount filesystem.
>mount …
[View More]filesystem read-only.
>run fsck on filesystem.
>remount filesystem read-write.
Sure, but for many environments, read-only is not an option. It
may be for you, but for others it's still downtime.
>Why should I expect downtime? A failed disk is a problem, but it
>doesn't cause downtime. A failed power-supply is a problem, but it
>also doesn't cause downtime. A failed head is a problem, but in a
>cluster, no downtime (well, 60 seconds downtime). NA has designed the
>filer to stay up in the face of these problems. So if NA has a check
>list of problems and it is working its way down the check list to keep
>filers up in the face of these problems, then "file-system
>health-check and fix" needs to be added to that list. Sure, it isn't a
>common occurance, but clearly it happens often enough for NA to have
>written wack and constantly improved it over the years. I'm arguing
>that the next improvement is to allow wack to be run on an on-line
>filer.
Upgrading your software/firmware/disk firmware is more common. So
is adding new cards into the system. Yet you expect downtime on
those. So if what was most common was the measure of importance,
then these should be worked on before an on-line wack.
I agree online wack is a good thing to have, but I guess I'm
satisfied in seeing Netapp concentrate on other bugs and features
first. I wasn't arguing that it shouldn't be done.
Bruce
[View Less]
On 10/10/99 02:59:52 you wrote:
>
> "alexei" == alexei <alexei(a)mindspring.net> writes:
>
>> this rather distressing. The filer really needs a way to perform a
>> filesystem health check w/o downtime.
>
> alexei> Like running fsck on a mounted filesystem? Some things are
> alexei> better done in a quiesced state...
>
>The filer is not a Unix host serving NFS. I expect more out of it. I
>didn't say it needed to correct errors while …
[View More]serving content, I said
>it needed to be able to do a health-check. If you go back to my
>original message, I mentioned that if it required an immutable
>filesystem to do this, then you should be able to tag a filesystem
>(not just an export) read only.
The raid scrubbing is indeed meant to be such a check, although it is
not filesystem-based.
>You can btw, run fsck on a mounted filesystem. Solaris happily runs
>'fsck -n' on a mounted file system (yes, I know, it will also happily
>run 'rm -rf /' which doesn't mean you should do it - lot's of rope and
>whatnot). Linux will run e2fsck after making some noise. I don't see
>any reason why this would be dangerous on a filesystem mounted read
>only.
The problem is that with a changing filesystem, such programs could
easily report a problem when in fact there is none. There are some
ways around this.
Personally, while I think this should be on Netapp's agenda, there
are more important things as well. Wack has been improved and now
runs much faster than before. You should expect some downtime to
happen when problems occur; having parity inconsistencies is *not*
a normal occurrance and should not happen often.
Bruce
[View Less]
Is there a way to force the Netapp to forget about specific locks?
I know how to get the Netapp to forget about *all* locks from a
particular client, but I don't always want that.
My specific problem is with the FrontPage extensions on Solaris
(using Apache as the web server). Once in a while, the extensions
think there is another instance running. It determines this by
attempting to lock a file called vti_pvt/service.lck .
I can run lock_dump on the Netapp and see the inode and …
[View More]client-
side pid that was granted the lock, but in some cases that pid is no
longer extant. All the FP documentation I've seen says "kill off the
hung process", which doesn't apply here. I've been able to work
around the problem each time simply by renaming/removing the
service.lck file, but I thought I should at least investigate the
possibility of removing entries from the Netapp's table of locks.
--
Brian Tao (BT300, taob(a)risc.org)
"Though this be madness, yet there is method in't"
[View Less]
Jay Soffian <jay(a)cimedia.com> writes:
> Do parity errors during a disk scrub necessarily indicate something is
> wrong with the filesystem?
Not neccessarily.
> this rather distressing. The filer really needs a way to perform a
> filesystem health check w/o downtime.
Like running fsck on a mounted filesystem? Some things are better
done in a quiesced state...
>NA also needs to publish
> accurate timing data on wack via NOW. The only document I can find has
> …
[View More]ancient data.
There are way too many variables to that equation. It depends on the version
of the OS, the version of wack, etc. I have seen times when running the
incorrect version of wack yielded 3 hour wack's (before someone mentioned
this should not be the case; we restarted with the proper version).
A similar table that would be nice is raid reconstruct speeds based
on volume size, filer load and disk size. Now that would be interesting.. :)
Alex
[View Less]
"Eyal Traitel" <r55789(a)email.sps.mot.com> writes:
> It created a complete havoc for our site when we were with tcp (apps hangups
> etc.)
There were also some pretty substantial bugs with NFS over TCP.
For a while NetApp shipped the filers with the option set to on,
but after a while they set it to off by default.
The problems I had experienced (a while ago; I have not needed to
look into the tcp option again) was that after a while the client would
not be able to talk to …
[View More]the filer. Something about the sequence numbers
not being quite right.
At any rate, thorough testing with your client sets would be a good thing
to do before going live with it.
Alex
[View Less]
> > "Puneet" == Puneet Anand <puneet(a)netapp.com> writes:
> >
> > Puneet> Jay, This might help
> >
> > Puneet> http://now.netapp.com/NOW/knowledge/contents/TIP/TIP_502.shtml
> >
> > Excellent. FYI - A search on "Downgrading" from NOW with all options
> > checked didn't return that document.
> >
> > j.
>
> Hi Jay,
> downgrading would not return the document but "revert"
> would. …
[View More]Hope that helps.
>
> No, not really. The issue is that I wasn't able to find this document
> using the search engine. The process I was looking for is called
> downgrading, not reverting. The title of the document is "Downgrading
> the Filer from Data ONTAP 5.3 through 3.1.6". I would therefor expect
> a search on "downgrading" to return this document. While the filer
> command is "revert_to", I didn't know this (if I did, I probably
> wouldn't have needed this document in the first place).
Ok. The point of sending the info was simply to give you the
term to search on if you should ever need the document/procedure
again. Sorry for any mis-communication. Just trying to be
helpful.
Indeed you are correct the process is downgrading. I guess the
term revert is a Network Appliance term. I will cc the
now-admin(a)netapp.com and have them include the meta-tag
downgrade and downgrading to the document so that a search
on those terms returns the document you had hoped to find.
--
~~~~~~~~~~~~~~~~~
Mike Smith
mikesmit(a)netapp.com
http://now.netapp.com
408-822-4755
Technical Support Engineer
~~~~~~~~~~~~~~~~~
>
> Thanks.
>
> j.
> --
> Jay Soffian <jay(a)cimedia.com> UNIX Systems Engineer
> 404.572.1941 Cox Interactive Media
[View Less]
All our filers do NFS using UDP. If i
turn the nfs.tcp.enable option to on,
does it *add* that capability, or does it
do it one way or the other?
Thanks,
Graham
On 10/08/99 10:59:26 you wrote:
>In searching for a bug on NOW, I've noticed that a majority of bugs
>are still in the open state. Does this mean these bugs have not been
>fixed?
I always assumed open bugs were the ones that have not been fixed in
a current release, yes. Note that NOW doesn't list all bugs, either,
so there are probably a lot of bugs that are fixed that never make it
on there. Customers probably want to know more about known bugs with
workarounds than those without …
[View More]workarounds or have been fixed.
>What does the 'release fix' column refer to? Does this mean the bug
>applies only to that specific release, or that the bug was fixed in
>that release, or that the bug was first noticed in that release?
I always thought it meant the bug is (believed to be) fixed in that
release (and all future ones based from it).
>Are bugs numbered in a strictly sequential order?
They used to be.
>If so, how is it
>that bug #289 applies to release 5.3?
I'm not sure I understand the question. Are you surprised than a
bug discovered years ago may have not yet been fixed? That's not
unusual if the bug is rare or difficult to replicate, or requires
some new features to resolve the problem.
>It would be nice to see the bug tracking system be setup more clearly,
>perhaps with the following fields:
>
>Date bug opened.
>Date bug closed.
>First release bug noticed.
>Releases which contain bug.
>First release which fixed bug.
>State: (opened, analyzed, solved, fix incorporated into release).
I thought NOW did contain that information (other than the dates), or
was at least supposed to in the ideal world. Note it is a non-trivial
problem to list all releases wich contain a bug, especially before the
relevant code has been analyzed (which is generally very close to the
time it would be fixed). So for an open bug, it is often going to be
hard to trust that the bug isn't in your release as well, be it younger
or older than the one in which the problem was first spotted.
>I'd have to guess that the NOW bug system is not the same system used
>internally by NA engineers.
It used to be based of/integrated with it. Perhaps that has changed.
Bruce
[View Less]