Hi,
We have two netapp filers serving mainly static data (updated overnight)
to several hp-ux clients.
If one server fails we can swap the clients onto the other server by
mounting and switching symbollic links.
eg. on a client:
/discs/data -> /netapp1/data
If netapp1 fails we mount data from netapp2 and relink:
/discs/data -> /netapp2/data
Now this leaves processes hanging on the dead server which can be killed
using fuser -k netapp1:/data. This is only necessary if memory/swap …
[View More]on
the clients is tight, otherwise the "hung" processes will sleep until
the server comes back.
This is not ideal so I had a thought, since we have Quad 100BaseT cards,
in the event of a failure, we could configure up a spare port to be the
same name and IP address as the dead server and therefore take over the
nfs requests for it.
Since we are LIVE I am unable to test this, but subject to the routers
and arp tables becoming updated I would have thought this would work.
I'm not sure about file locking, but I didn't think this works very well
on nfs anyway - does it ? NFS is stateless so it should be ok.
Anyone out there tried this or able to on a couple of test boxes ?
Cheers
Ian Gardner
Vodafone Ltd.
Newbury
UK.
ian.gardner(a)vf.vodafone.co.uk
[View Less]
On 09/09/98 09:36:36 you wrote:
>AFAIK, there's no problem leaving a bad drive sitting in the shelf
>arbitrarily long, as long as you don't need the drive bay for something
>else (unless the drive is bad in a way that might affect other drives,
>like if flames or sparks are coming out of it :-).
One gotcha is that if you leave a bad drive sitting in the shelf,
and the system reboots, it could think the drive is a hot spare
and allow it to be used again. I don't think this …
[View More]happens in all
circumstances (I think there's a way to mark the drive bad such
that it won't be used), but it certainly can (and has) happened in
the past. So it's generally a good idea to remove the bad drive
from the shelf as soon as possible.
Bruce
[View Less]
Well, its been a week, and as such I suppose time for a summary. Last
week, I posted the following request for comments:
If you are a NetApp F630/FC-AL user and have encountered any
(especially recurring) glitches with interactions between the
Fibre Channel host adapter and OnTAP, or recurring FC hardware
failures I'd appreciate hearing from you.
On the flip side, if you are a NetApp F630/FC-AL user and have
seen perfect performance from your filer, I'd love to hear
about it too.
I …
[View More]was surprised to get only 5 responses, one from a fellow considering
buying one. Out of the four owners that mailed me, two reported flawless
performance, one mentioned problems caused by extreme environmental
conditions, and one reported RAID bugs similar to the ones I had seen.
Netapp replaced his shelf, as did they mine. I will post an update after
we have the new shelf installed if there are any other notables. Their
comment on the problem was "the low-level FC/SCSI system was unaware of a
particular error sequence and was passing error codes to the RAID level
where they caused a panic."
They also recommended upgrading to 5.0.2D4. There were some other FC fixes
rolled into 5.0.2, I'm not sure if there are additional ones in D4.
Steve Clarke <Steven.Clarke(a)ThePLAnet.net> pointed out,
"It has been my opinion for a while that there are people at
NetApp who can make things happen and others who just make the
right noises but do nothing."
This has been my experience, too. However once I've been put in touch with
the Clueful Folk, they have been dedicated above and beyond the call to
resolving my issues. I've heard mumblings out-of-band about the usual low
morale problems among their tech support staff and I sincerely hope that
Netapp management recognise their amazing dedication.
Thanks to Steven Clarke, Matt Stein, Fritz Feltner and Graham Knight for
their time responding to my post.
Alohas,
matto
--matt(a)snark.net-----------------------------------------------------
Matt Ghali MG406/GM023JP - System Administrator, interQ, Inc AS7506
"Sub-optimal is a state of mind." -Dave Rand, <dlr(a)bungi.com>
[View Less]
leila.mutevelic(a)rss.rockwell.com writes:
> Is it true that when one HD goes bad, it will be reconstructed on the
> hotspare, but if you don't pull out bad HD and replace it within the 24
> hours system will shutdown?
No, it's not true.
> What happens in a case if you have 2 hotspare?
If your system has run out of spares and a disk fails, it will be in
degraded mode until you replace one or more spares. If an additional
disk fails while in degraded mode, you are hosed.
Having …
[View More]two spares helps protect you from a second disk failure, as
long as that second failure happens after the first reconstruction has
finished.
However, even with two hot spares, I would recommend replacing
replacing failed disks with new spares as soon as possible.
- Dan
[View Less]
I think our system goes into 'degraded' mode when no spare is present.
I'm the degenerate.
(some mail misaddresses running around.)
> Not quite. A failed disk will send the system into 'degenerated' mode if
> there is no spare present. The system will shut down after 24 hours in
> degenerated mode as a failsafe. However, if a spare disk *is* present, the
> system goes into 'reconstruction' mode and rebuilds the failed disk. Once
> reconstructed the system is back up and …
[View More]running as usual with no threat of
> a failsafe shutdown.
> If you have 2 hot spare disks, and one is used to rebuild a failed disk,
> then you will have one spare disk remaining.
>
> leila.mutevelic(a)rss.rockwell.com wrote:
>
> > Is it true that when one HD goes bad, it will be reconstructed on the
> > hotspare, but if you don't pull out bad HD and replace it within the 24
> > hours system will shutdown?
> > What happens in a case if you have 2 hotspare?
> >
> > Thanks,
> > --Leila.
>
>
>
> --
> Shannon Madison <shannon(a)netapp.com>
> Technical Education Manager
>
> Network Appliance, Inc.
> 2770 San Tomas Expressway
> Santa Clara, CA 95051
> USA
>
> Work: (408) 367-3574
> Fax: (408) 367-2151
>
> The usual disclaimers apply.
>
>
----- End of forwarded message from Brian Pawlowski -----
[View Less]
1
0
Bad HD
by leila.mutevelic@rss.rockwell.com
08 Sep '98
08 Sep '98
Is it true that when one HD goes bad, it will be reconstructed on the
hotspare, but if you don't pull out bad HD and replace it within the 24
hours system will shutdown?
What happens in a case if you have 2 hotspare?
Thanks,
--Leila.
What is the latest and most stable version of OnTap for a F630 (Non
F-CAL)? What configuration issues and bugs should I be aware of? Any
Info on configuring CIFS for this version of OnTap would also be
appreciated.
Thanks in advance.
Kevin
Hi there everyone. What is the likelihood of NetApp ever
supporting the CODA filesystem?
http://www.coda.cs.cmu.edu
Thanks!
` ~ ^ ' ~ ' ` ^ ~ ~ ^ ' ~ ` ` ~ ' ^ ` ~ ~ ` " ` ~ ' ^ " ^ ` ~ ' ~ ^ " ' ~ ` ^ ~
"If I seem too inconclusive, well it's just because it's so elusive."
This email has been licensed by the GPL (http://www.gnu.org/philosophy)
Dan ((((now seeking a Linux/Unix sysadmin job in Silicon Valley!)))) Bethe
Dan Bethe <dtm(a)hex.net> writes:
>> Hi there everyone. What is the likelihood of NetApp ever
>> supporting the CODA filesystem?
>> http://www.coda.cs.cmu.edu
Brian Pawlowski <beepy(a)netapp.com> writes:
> As in grandson of AFS, disconnected operation?
>
> I'm curious, who is supporting it now?
Many of our engineers who encounter problems using NFS have asked us
about CODA and AFS. I am more interested in fixing some of the
problems we encounter …
[View More]with NFS rather than introducing new ones, but
CODA might be the answer.
So, does NetApp have a better answer?
- Dan
[View Less]