We're currently evaluating several solutions for hosting initially about 1TB but will probably grow to several times that over the next 18 months.
We already have a small 740 happily chugging away for the past 4 months and have been extremely happy with it.
The configuration we're considering is clustering two F760s. Each filer will serve its own data and also be physically connected to the disks that the other filer serves. From what I understand this is the standard clustering configuration. We would plan to add more filers in pairs as we outgrow the initial pair.
The main NFS clients would be bunch of suns that run as application servers. They'll take requests, munge data and return results back to other parts of the site. All the application servers would be able to see all the data. Since the application servers don't hold any data they would all be identical (and expendable.)
In addition, we're considering adding a second raid volume to each filer and using snapmirror against the other filer's primary volume. This would protect us against the possibility of one of the primary raid devices having a catastrophic failure. We'd run snapmirror every five minutes or so to keep the secondary updated.
Each filer would have a primary and secondary ethernet interface connected to different edge devices on a switched GB network. This would assure that if a network edge device went down the filer would still be able to reach the backbone.
Backups would happen with traditional methods using a snapshot of the mirror copy.
Here's a cheap ascii picture of what I poorly described above:
+---------------+ +---------------+ |snapmirror of | |snapmirror of | |filer 2 primary| |filer 1 primary| +---------------+ +---------------+ | | | +-------+ | | +-------|filer 1|-------+ | | | |primary| | | +-------+ | disks | +-------+ gb ether---| | +-------+ | |---gb ether |filer 1| |filer 2| gb ether---| | | |---gb ether +-------+ +-------+ +-------+ | |filer 2| | +--------|primary|--------+ | disks | +-------+ gb ether gb ether gb ether gb ether gb ether gb ether | | | | | | +------+ +------+ +------+ +------+ +------+ +------+ | app | | app | | app | | app | | app | | app | |server| |server| |server| |server| |server| |server| +------+ +------+ +------+ +------+ +------+ +------+
I'm looking for any experiences you have had with this or any subset of the picture. Specifically, I'd like to hear experiences about
- Clustering: Does it work for you? Have you had a filer die and had the secondary take over? Have you had a filer die and the secondary NOT take over?
- Snapmirror: Do you use it? How does frequent mirroring affect performance of both the sending and receiving systems? Have you had any consistency or other problems with it? Can you take snapshots of the receiving disk?
- Multiple paths to the network. Can you aggregate bandwidth through multiple gigabit connections to the same network? What about setting one interface as the primary and the other as a secondary?
- WAFL: Good or bad things to say about the filesystem.
- NFS issues. NFS is famous for problems with lockups, locking problems, stuck mounts, etc. We won't be doing any locking over NFS mounts and will be using whatever version of solaris is recommended at the time. Anything good or bad that can be said about the combination of a netapp and sun.
Any other information or advice regarding this arrangement would, of course, be welcome. This is being considered as an alternative to the traditional method of hanging a RAID off each server but there are still some big hangups about NFS in general. Good (or bad) experiences regarding this would be appreciated.
Thanks for reading this far and thanks in advance for any knowledge you can pass on.
Joe Gross