toasters October 1999

toasters@lists.teaparty.net

81 participants
89 discussions

Re: Autosupport, 5.2.3 upgrade, diskscrub and wackz
by sirbruce＠ix.netcom.com 12 Oct '99

12 Oct '99

On 10/11/99 07:41:30 you wrote: >The volume could be offline, that's for sure, but why the machine ? This is the result of a less-than-thorough multivolume implementation plan. Originally, there only *was* one volume, so taking the whole filer down wasn't an issue. Presumably when mutlivolume was to be implemented, Netapp decided it was more important to get that out the door than to spend more time and resources modifying a better wack interface to run on an offline volume while the system remained up. Note that this is also different from a request for a completely online wack on an active filesystem, or allowing a filesytem to remain online read-only while being wacked. I still question the utility of the latter, but it might be good as part of an automated process strategy (just as snapshots are created for dump). Anyway, these are all very different enhancements. Netapp will have to decide which ones to implement, and which ones to implement first. Bruce

2 1

Client pack fo NetApp - implementation ???
by Itzik (Itzhak) Meirson DSc 11 Oct '99

11 Oct '99

I wonder how is the backup performed by the new client for NetApp... 1. Is the backup done from a special snapshot or from the "live" files? 2. When defining the NetApp client what directive should be used? I might have missed this information from the docs I have seen. Thanks in advance, Itzik

1 0

Re: Autosupport, 5.2.3 upgrade, diskscrub and wackz
by sirbruce＠ix.netcom.com 11 Oct '99

11 Oct '99

On 10/10/99 12:12:52 you wrote: > > "sirbruce" == sirbruce <sirbruce(a)ix.netcom.com> writes: > > sirbruce> The problem is that with a changing filesystem, such > sirbruce> programs could easily report a problem when in fact > sirbruce> there is none. There are some ways around this. > >Huh? If the filesystem is made immutable, it isn't a changing >filesystem. e.g, on a Unix host, this _should_ be safe: > >unmount filesystem. >mount filesystem read-only. >run fsck on filesystem. >remount filesystem read-write. Sure, but for many environments, read-only is not an option. It may be for you, but for others it's still downtime. >Why should I expect downtime? A failed disk is a problem, but it >doesn't cause downtime. A failed power-supply is a problem, but it >also doesn't cause downtime. A failed head is a problem, but in a >cluster, no downtime (well, 60 seconds downtime). NA has designed the >filer to stay up in the face of these problems. So if NA has a check >list of problems and it is working its way down the check list to keep >filers up in the face of these problems, then "file-system >health-check and fix" needs to be added to that list. Sure, it isn't a >common occurance, but clearly it happens often enough for NA to have >written wack and constantly improved it over the years. I'm arguing >that the next improvement is to allow wack to be run on an on-line >filer. Upgrading your software/firmware/disk firmware is more common. So is adding new cards into the system. Yet you expect downtime on those. So if what was most common was the measure of importance, then these should be worked on before an on-line wack. I agree online wack is a good thing to have, but I guess I'm satisfied in seeing Netapp concentrate on other bugs and features first. I wasn't arguing that it shouldn't be done. Bruce

4 3

Re: Autosupport, 5.2.3 upgrade, diskscrub and wackz
by sirbruce＠ix.netcom.com 11 Oct '99

11 Oct '99

On 10/10/99 02:59:52 you wrote: > > "alexei" == alexei <alexei(a)mindspring.net> writes: > >> this rather distressing. The filer really needs a way to perform a >> filesystem health check w/o downtime. > > alexei> Like running fsck on a mounted filesystem? Some things are > alexei> better done in a quiesced state... > >The filer is not a Unix host serving NFS. I expect more out of it. I >didn't say it needed to correct errors while serving content, I said >it needed to be able to do a health-check. If you go back to my >original message, I mentioned that if it required an immutable >filesystem to do this, then you should be able to tag a filesystem >(not just an export) read only. The raid scrubbing is indeed meant to be such a check, although it is not filesystem-based. >You can btw, run fsck on a mounted filesystem. Solaris happily runs >'fsck -n' on a mounted file system (yes, I know, it will also happily >run 'rm -rf /' which doesn't mean you should do it - lot's of rope and >whatnot). Linux will run e2fsck after making some noise. I don't see >any reason why this would be dangerous on a filesystem mounted read >only. The problem is that with a changing filesystem, such programs could easily report a problem when in fact there is none. There are some ways around this. Personally, while I think this should be on Netapp's agenda, there are more important things as well. Wack has been improved and now runs much faster than before. You should expect some downtime to happen when problems occur; having parity inconsistencies is *not* a normal occurrance and should not happen often. Bruce

3 2

Releasing file locks on the Netapp
by Brian Tao 10 Oct '99

10 Oct '99

Is there a way to force the Netapp to forget about specific locks? I know how to get the Netapp to forget about *all* locks from a particular client, but I don't always want that. My specific problem is with the FrontPage extensions on Solaris (using Apache as the web server). Once in a while, the extensions think there is another instance running. It determines this by attempting to lock a file called vti_pvt/service.lck . I can run lock_dump on the Netapp and see the inode and client- side pid that was granted the lock, but in some cases that pid is no longer extant. All the FP documentation I've seen says "kill off the hung process", which doesn't apply here. I've been able to work around the problem each time simply by renaming/removing the service.lck file, but I thought I should at least investigate the possibility of removing entries from the Netapp's table of locks. -- Brian Tao (BT300, taob(a)risc.org) "Though this be madness, yet there is method in't"

1 0

Re: Autosupport, 5.2.3 upgrade, diskscrub and wackz
by alexei＠mindspring.net 10 Oct '99

10 Oct '99

Jay Soffian <jay(a)cimedia.com> writes: > Do parity errors during a disk scrub necessarily indicate something is > wrong with the filesystem? Not neccessarily. > this rather distressing. The filer really needs a way to perform a > filesystem health check w/o downtime. Like running fsck on a mounted filesystem? Some things are better done in a quiesced state... >NA also needs to publish > accurate timing data on wack via NOW. The only document I can find has > ancient data. There are way too many variables to that equation. It depends on the version of the OS, the version of wack, etc. I have seen times when running the incorrect version of wack yielded 3 hour wack's (before someone mentioned this should not be the case; we restarted with the proper version). A similar table that would be nice is raid reconstruct speeds based on volume size, filer load and disk size. Now that would be interesting.. :) Alex

3 2

Re: nfs.tcp.enable option
by alexei＠mindspring.net 10 Oct '99

10 Oct '99

"Eyal Traitel" <r55789(a)email.sps.mot.com> writes: > It created a complete havoc for our site when we were with tcp (apps hangups > etc.) There were also some pretty substantial bugs with NFS over TCP. For a while NetApp shipped the filers with the option set to on, but after a while they set it to off by default. The problems I had experienced (a while ago; I have not needed to look into the tcp option again) was that after a while the client would not be able to talk to the filer. Something about the sequence numbers not being quite right. At any rate, thorough testing with your client sets would be a good thing to do before going live with it. Alex

1 0

Re: downgrading from 5.3.1 to 5.2.3
by Mike Smith 10 Oct '99

10 Oct '99

> > "Puneet" == Puneet Anand <puneet(a)netapp.com> writes: > > > > Puneet> Jay, This might help > > > > Puneet> http://now.netapp.com/NOW/knowledge/contents/TIP/TIP_502.shtml > > > > Excellent. FYI - A search on "Downgrading" from NOW with all options > > checked didn't return that document. > > > > j. > > Hi Jay, > downgrading would not return the document but "revert" > would. Hope that helps. > > No, not really. The issue is that I wasn't able to find this document > using the search engine. The process I was looking for is called > downgrading, not reverting. The title of the document is "Downgrading > the Filer from Data ONTAP 5.3 through 3.1.6". I would therefor expect > a search on "downgrading" to return this document. While the filer > command is "revert_to", I didn't know this (if I did, I probably > wouldn't have needed this document in the first place). Ok. The point of sending the info was simply to give you the term to search on if you should ever need the document/procedure again. Sorry for any mis-communication. Just trying to be helpful. Indeed you are correct the process is downgrading. I guess the term revert is a Network Appliance term. I will cc the now-admin(a)netapp.com and have them include the meta-tag downgrade and downgrading to the document so that a search on those terms returns the document you had hoped to find. -- ~~~~~~~~~~~~~~~~~ Mike Smith mikesmit(a)netapp.com http://now.netapp.com 408-822-4755 Technical Support Engineer ~~~~~~~~~~~~~~~~~ > > Thanks. > > j. > -- > Jay Soffian <jay(a)cimedia.com> UNIX Systems Engineer > 404.572.1941 Cox Interactive Media

2 1

nfs.tcp.enable option
by Graham Knight 09 Oct '99

09 Oct '99

All our filers do NFS using UDP. If i turn the nfs.tcp.enable option to on, does it *add* that capability, or does it do it one way or the other? Thanks, Graham

3 2

Re: open bugs / closed bugs
by sirbruce＠ix.netcom.com 09 Oct '99

09 Oct '99

On 10/08/99 10:59:26 you wrote: >In searching for a bug on NOW, I've noticed that a majority of bugs >are still in the open state. Does this mean these bugs have not been >fixed? I always assumed open bugs were the ones that have not been fixed in a current release, yes. Note that NOW doesn't list all bugs, either, so there are probably a lot of bugs that are fixed that never make it on there. Customers probably want to know more about known bugs with workarounds than those without workarounds or have been fixed. >What does the 'release fix' column refer to? Does this mean the bug >applies only to that specific release, or that the bug was fixed in >that release, or that the bug was first noticed in that release? I always thought it meant the bug is (believed to be) fixed in that release (and all future ones based from it). >Are bugs numbered in a strictly sequential order? They used to be. >If so, how is it >that bug #289 applies to release 5.3? I'm not sure I understand the question. Are you surprised than a bug discovered years ago may have not yet been fixed? That's not unusual if the bug is rare or difficult to replicate, or requires some new features to resolve the problem. >It would be nice to see the bug tracking system be setup more clearly, >perhaps with the following fields: > >Date bug opened. >Date bug closed. >First release bug noticed. >Releases which contain bug. >First release which fixed bug. >State: (opened, analyzed, solved, fix incorporated into release). I thought NOW did contain that information (other than the dates), or was at least supposed to in the ideal world. Note it is a non-trivial problem to list all releases wich contain a bug, especially before the relevant code has been analyzed (which is generally very close to the time it would be fixed). So for an open bug, it is often going to be hard to trust that the bug isn't in your release as well, be it younger or older than the one in which the problem was first spotted. >I'd have to guess that the NOW bug system is not the same system used >internally by NA engineers. It used to be based of/integrated with it. Perhaps that has changed. Bruce

1 0

← Newer
1
2
3
4
5
6
7
8
9
Older →

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

toasters October 1999