Attached are the messages I received in response to my query about
clusters. Names have been removed to protect the guilty.
--Brian L. Brush
Senior Systems Administrator
Paradyne Corporation
Clustering is great, if you expect the head to go down a lot. In our
experience, we have had more problems with disks, shelves, LRCs, etc
going bad, all problems that clustering can not help. It does make it
nice when we have to do maintenance, though; we can fail everything over
to the second head, do the maintenance, power up the first head, and
give back the disks to the first head. Repeat for the second head, and
the cluster is ready to go without the user noticing.
Just from personal experience, a F760 cluster is less cost effective
than an F840c cluster. Due to a combination of hardware and software
limitations with the 7xx series filers, a single F760 filer can handle
1.4TB of data (raw), or up to almost 3TB with Ontap 6.0. Unfortunately,
a cluster of F760's can also handle only 1.4TB of data, or up to almost
3TB with 6.0. This makes sense when you consider that in a cluster, a
single head will be doing the work of two. With the new 8xx series
filers running 6.0, these limitations have been overcome, meaning a
single filer can handle upwards of 6TB alone, or 12TB in a cluster.
We have had one case where clustering came back and bit us, though.
Something went bad and corrupted a volume. That caused the first filer
to shutdown. The cluster pair saw that its partner was down, took over
the disks, found a corrupted volume, and it shutdown too. But that
said, clustering has probably saved us more than it has hurt us. I
guess that's why we have 6 F760 clusters now.
I like mine. When one died, no one else knew. The cluster partner
even answered pings while I had the system board out on the bench.
We got a pair of F840s scheduled to arrive tomorrow with the
cluster options. It will be a week or so before we're able to test it
and finally roll it into production.
However, we visited the local NetApp office where they had a
pair of F760s running ONTAP 6, and they were clustered. A co-worker and
I spent over two hours playing with it. What happens if we do <this>?
Or <that>? Or pull this cable? Or shutdown the filer when it's trying
to fail back? All kinds of cool stuff I would never try with my own
filers!
The long and the short of it was: it behaved as claimed.
The nice thing about netapp clustering is that you have two
fully functioning filers. You do not have one active filer
and another waiting in "standby" mode. So this means that
you can put half your data on one filer and half on the other.
If one fails, then the other assumes the identity of its
failed partner and pretends to be two different filers.
Of course one filer doing the work of two won't perform as
well as two filers.
To accomplish this feat, each filer needs two FCAL
adapters and each disk shelf has two FCAL adapters. You
connect the primary adapter on filer A to the primary adapters
on filer A's shelves. You connect the secondary adapter
on filer A to the secondary adapters on filer B's shelves.
You connect the primary adapter on filer B to the primary
adapters on filer B's shelves, and I'll bet you can guess
what you connect to filer B's secondary adapter.
In the event of a failover, the surviving filer has access
to its partner's shelves via its secondary adapter.
A failover looks like the failed filer rebooted, so it's not
100% seamless. NFS service will resume, but CIFS clients
must reconnect.
I just setup a 760 cluster. It is pretty cool, I have never had either
filer fail so I don't really know how the cluster works in "real world"
failures. But I have gone over multiple times and just unplugged one of
the head units to show people how they work. From all the tests I have
done, I am very confident in its abilities. Do you have any specific
questions?
On a side note, the setup of the 760 was so darn easy, I thought I must
have done it wrong. If you do get a 760 cluster save your money and
install it yourself. It was just two extra cables and 4 more commands.