Joe. I'll give you a simple picture of the setup we're having at Ericsson In Stockholm serving 8000 users today.
6 F760 in cluster pairs all them primary owner of 750GB Disk 3 F760 Standalone Snapmirror replicas
In total ~9TB of disk
The're running in a mixed CIFS & UNIX Enviroment.
We installed this setup in the begining ot the year and runned across a couple of bugs in the cifs implementation. But overall we're realy happy with what we're having and most important it's doing lots of good for the organisation using it.
This is a few hints.
- Clustering works great! Booth manual and automatic.
- Snapmirror , i would not put my mirrors on the same heads. Fist of all i like to move the data to a different location. Putting the mirrors on the cluster heads make's the supported amount of data stored on each head go down ( Today thers is a 750 GB limit in cluster configs due to sharing NVRAM etc. ) It's realy fast to make a snapmirror head take the place of a destroyed cluster.
With the standard trottle it's not taking to much power out of the heads. We are pointing the snapmirror traffic to "dedicated" NIC's on the heads so it does'nt interfear with client traffic.
We take our backups from the snapmirror heads on a separat "Storage SAN" which gives us a larger backup window and does'nt effect the clients in any way.
No snapshots, the mirrored volumes are read only.
- Network balancing , today there is no failover/balancing support in my GB NIC's i keep 3 in every head. One for client access, one for fail over and one for snapmirroring. I heard there is going to be GB Ether Channel support, which would give failover/loadbalancing try to get an update from your sales rep.
- WAFL , no problems snapshots makes it easy for the users to recover files them self. The raid is working good. snapshots saved us a coupple of times.
- NFS I havent been across any problems with any NFS platform ( SUN, HP , IBM etc ) Just realy good NFS performance.
Let me know it there is any thing a can answer for you.
Good Luck
Anders Ljungberg
-----Original Message----- From: Joe Gross [mailto:jgross@stimpy.net] Sent: Tuesday, December 07, 1999 7:47 AM To: toasters@mathworks.com Subject: experiences with clustering and snapmirror
We're currently evaluating several solutions for hosting initially about 1TB but will probably grow to several times that over the next 18 months.
We already have a small 740 happily chugging away for the past 4 months and have been extremely happy with it.
The configuration we're considering is clustering two F760s. Each filer will serve its own data and also be physically connected to the disks that the other filer serves. From what I understand this is the standard clustering configuration. We would plan to add more filers in pairs as we outgrow the initial pair.
The main NFS clients would be bunch of suns that run as application servers. They'll take requests, munge data and return results back to other parts of the site. All the application servers would be able to see all the data. Since the application servers don't hold any data they would all be identical (and expendable.)
In addition, we're considering adding a second raid volume to each filer and using snapmirror against the other filer's primary volume. This would protect us against the possibility of one of the primary raid devices having a catastrophic failure. We'd run snapmirror every five minutes or so to keep the secondary updated.
Each filer would have a primary and secondary ethernet interface connected to different edge devices on a switched GB network. This would assure that if a network edge device went down the filer would still be able to reach the backbone.
Backups would happen with traditional methods using a snapshot of the mirror copy.
Here's a cheap ascii picture of what I poorly described above:
+---------------+ +---------------+ |snapmirror of | |snapmirror of | |filer 2 primary| |filer 1 primary| +---------------+ +---------------+ | | | +-------+ | | +-------|filer 1|-------+ | | | |primary| | | +-------+ | disks | +-------+ gb ether---| | +-------+ | |---gb ether |filer 1| |filer 2| gb ether---| | | |---gb ether +-------+ +-------+ +-------+ | |filer 2| | +--------|primary|--------+ | disks | +-------+ gb ether gb ether gb ether gb ether gb ether gb ether | | | | | | +------+ +------+ +------+ +------+ +------+ +------+ | app | | app | | app | | app | | app | | app | |server| |server| |server| |server| |server| |server| +------+ +------+ +------+ +------+ +------+ +------+
I'm looking for any experiences you have had with this or any subset of the picture. Specifically, I'd like to hear experiences about
- Clustering: Does it work for you? Have you had a filer die and had the secondary take over? Have you had a filer die and the secondary NOT take over?
- Snapmirror: Do you use it? How does frequent mirroring affect performance of both the sending and receiving systems? Have you had any consistency or other problems with it? Can you take snapshots of the receiving disk?
- Multiple paths to the network. Can you aggregate bandwidth through multiple gigabit connections to the same network? What about setting one interface as the primary and the other as a secondary?
- WAFL: Good or bad things to say about the filesystem.
- NFS issues. NFS is famous for problems with lockups, locking problems, stuck mounts, etc. We won't be doing any locking over NFS mounts and will be using whatever version of solaris is recommended at the time. Anything good or bad that can be said about the combination of a netapp and sun.
Any other information or advice regarding this arrangement would, of course, be welcome. This is being considered as an alternative to the traditional method of hanging a RAID off each server but there are still some big hangups about NFS in general. Good (or bad) experiences regarding this would be appreciated.
Thanks for reading this far and thanks in advance for any knowledge you can pass on.
Joe Gross