RE: experiences with clustering and snapmirror - toasters

7 Dec 1999


      Joe.
I'll give you a simple picture of the setup we're having at Ericsson In Stockholm serving 8000 users today.
6 F760 in cluster pairs all them primary owner of 750GB Disk
3 F760 Standalone Snapmirror replicas
In total ~9TB of disk
The're running in a mixed CIFS & UNIX Enviroment.
We installed this setup in the begining ot the year and runned across a couple of bugs in the cifs implementation.
But overall we're realy happy with what we're having and most important it's doing lots of good for the organisation using it.
This is a few hints.
- Clustering works great!
Booth manual and automatic.
- Snapmirror , i would not put my mirrors on the same heads. Fist of all i like to move the data to a different location. Putting the mirrors on the cluster heads make's the supported amount of data  stored on each head go down ( Today thers is a 750 GB limit in cluster configs due to sharing NVRAM etc. )
It's realy fast to make a snapmirror head take the place of a destroyed cluster.
With the standard trottle it's not taking to much power out of the heads.
We are pointing the snapmirror traffic to "dedicated" NIC's on the heads so it does'nt interfear with client traffic.
We take our backups from the snapmirror heads on a separat "Storage SAN" which gives us a larger backup window and
does'nt effect the clients in any way.
No snapshots, the mirrored volumes are read only.
- Network balancing , today there is no failover/balancing support in my GB NIC's i keep 3 in every head.
One for client access, one for fail over and one for snapmirroring. I heard there is going to be GB Ether Channel 
support, which would give failover/loadbalancing try to get an update from your sales rep.
- WAFL , no problems snapshots makes it easy for the users to recover files them self. The raid is working good.
snapshots saved us a coupple of times.
- NFS I havent been across any problems with any NFS platform ( SUN, HP , IBM etc )
Just realy good NFS performance.
Let me know it there is any thing a can answer for you.
Good Luck
Anders Ljungberg
-----Original Message-----
From: Joe Gross [mailto:jgross@stimpy.net]
Sent: Tuesday, December 07, 1999 7:47 AM
To: toasters@mathworks.com
Subject: experiences with clustering and snapmirror
We're currently evaluating several solutions for hosting initially about 1TB
but will probably grow to several times that over the next 18 months.
We already have a small 740 happily chugging away for the past 4 months and
have been extremely happy with it.
The configuration we're considering is clustering two F760s. Each filer will
serve its own data and also be physically connected to the disks that the
other filer serves. From what I understand this is the standard clustering
configuration. We would plan to add more filers in pairs as we outgrow the
initial pair.
The main NFS clients would be bunch of suns that run as application
servers. They'll take requests, munge data and return results back to other
parts of the site. All the application servers would be able to see all the
data. Since the application servers don't hold any data they would all be
identical (and expendable.)
In addition, we're considering adding a second raid volume to each filer and
using snapmirror against the other filer's primary volume. This would
protect us against the possibility of one of the primary raid devices having
a catastrophic failure. We'd run snapmirror every five minutes or so to keep
the secondary updated.
Each filer would have a primary and secondary ethernet interface connected
to different edge devices on a switched GB network. This would assure that
if a network edge device went down the filer would still be able to reach
the backbone.
Backups would happen with traditional methods using a snapshot of the mirror
copy.
Here's a cheap ascii picture of what I poorly described above:
+---------------+	        	    +---------------+
    |snapmirror of  |  	        	    |snapmirror of  |
    |filer 2 primary|  	               	    |filer 1 primary|
    +---------------+	        	    +---------------+
             |   	        	       |             
                 |          +-------+	       |
                 |  +-------|filer 1|-------+  |
                 |  |       |primary|       |  | 
               +-------+    | disks |    +-------+
    gb ether---|       |    +-------+  	 |     	 |---gb ether    
               |filer 1|                 |filer 2|               
    gb ether---|       |                 |       |---gb ether    
               +-------+    +-------+    +-------+		 
                   |        |filer 2|        |			 
                   +--------|primary|--------+			 
                        | disks |				 
                            +-------+                               
               	
    gb ether   gb ether   gb ether   gb ether   gb ether   gb ether
    |          |          |          |          |          | 
    +------+   +------+   +------+   +------+   +------+   +------+
    | app  |   | app  |   | app  |   | app  |   | app  |   | app  |
    |server|   |server|   |server|   |server|   |server|   |server|
    +------+   +------+   +------+   +------+   +------+   +------+
I'm looking for any experiences you have had with this or any subset of the
picture. Specifically, I'd like to hear experiences about
- Clustering: Does it work for you? Have you had a filer die and had the
  secondary take over? Have you had a filer die and the secondary NOT take
  over?
- Snapmirror: Do you use it? How does frequent mirroring affect performance
  of both the sending and receiving systems? Have you had any consistency
  or other problems with it? Can you take snapshots of the receiving disk?
- Multiple paths to the network. Can you aggregate bandwidth through
  multiple gigabit connections to the same network? What about setting one
  interface as the primary and the other as a secondary?
- WAFL: Good or bad things to say about the filesystem.
- NFS issues. NFS is famous for problems with lockups, locking problems,
  stuck mounts, etc. We won't be doing any locking over NFS mounts and will
  be using whatever version of solaris is recommended at the time. Anything
  good or bad that can be said about the combination of a netapp and sun.
Any other information or advice regarding this arrangement would, of course,
be welcome. This is being considered as an alternative to the traditional
method of hanging a RAID off each server but there are still some big
hangups about NFS in general. Good (or bad) experiences regarding this would
be appreciated.
Thanks for reading this far and thanks in advance for any knowledge you can
pass on.
Joe Gross