On Mon, 9 Aug 1999, Jeffrey Skelton wrote:
- SAN is the future of storage
NetApp uses Fibre Channel so they're definitely ready for SANs from the hardware perspective. A lot of the vendors you mention still clutch onto SCSI which cannot be easily SANable.
- Netapps and NFS is a lock in to old technology
SCSI is far older. The concept of SANs isn't that new either. Think of NACs with NFS as intelligent SAN. Whether the fabric is Fibre Channel (for SAN) or Gigabit Ethernet (for NAS) the underlying technology is the same. There may be a bit more overhead and latency with NAS, but how many viable SAN solutions are really out there today?
- Netapps is JBOD
It's as many Little Old Disks as you want. What advantage do Little Old Disks give you? Isn't SAN a JBOD also, really?
- that Eurologic is an unknown scrap vendor
Granted that the tolerances on their shelves could be a bit better, but I've had no big problems with them. They're a hell of a lot better than the DEC shelves NAC sold before.
- Netapps are impossible to back up
Funny, we back them up every night.
- RAID 4 is unreliable
Nothing is reliable. If you have any RAID set - including mirroring - you kill two disks and you're cooked.
- WAFL is slow
What faster alternative are the other vendors proposing. Considering the fact that one does not have to write any particular block on disk it's very fast on writes. Sequential reads which can - but don't have to - suffer from WAFL can be optimized using caching to probably be as good as ufs.
- WAFL is impossible to recover when it becomes corrupt
It's really no more impossible than recovering from a ufs corruption. It doesn't differ from ufs that much. With NetApps the opportunity for corrupting a file system is significantly smaller than with ufs.
- wack takes all day to report that you're still in trouble
Other FSes are no better in this respect. If you have to wack you're in really deep doodoo, but then the necessity for whacks from my experience is EXTREMELY small.
- the cluster failover takes too long
It takes 1-2 minutes if I recall correctly. What window of outage is acceptable to you? What numbers were the other vendors giving you.
- the cluster failover is unreliable at best
It worked 100% of the time for me so far.
- the Netapps performance is terribly slow
It doesn't seem to me that was, but I haven't tested them for performance. I think the performance is pretty adequate for my requirements.
- the software is big pile of patches that are impossible to keep track of
I don't know about this, I'm not a developer. From my experience there are about 2 patch releases before they become rolled into a recommended release.
In general three themes came from the references:
- they lost data with Netapps. Why? Because they could not take a backup with their enterprise backup software and then they suffered a multiple disk failure from a set of disks with problem firmware.
How old are these claims? The NetApp of today is definitely not the NAC of the past.
How much of a problem is the backup situation and does the situation become a problem when it requires a patch for the Netapps software that may introduce other instabilities?
The backup problems I experienced was the fact that Veritas OpenVision had a problem with the newer version of NetApp OS. The problem was clarified between NetApp and Veritas and it appears that they've worked it out. We're on an older release 5.2.1P2 and backups appear to work just fine. Now that Veritas certified 5.3.2 we will be upgrading to the new version to get advantage of awaited new functionality.
- the Netapps filer was too slow for database use. I could understand how some of the storage arrays directly attached to a server could be faster. This isn't my application space, so I'm not outright concerned with it.
We're not using them for databases currently, so I cannot comment on such use.
But, I have seen some of the recent messages on performance and throughput. How would you rate the Netapps for performance for things like home directory storage?
This is one of our uses. It seems fine.
This concerns me, because I am willing to pay the extra money for a cluster. As long as I'm shelling out, should I go to EMC for reliability or hope that the Netapps failover can work? I can get more Netapps for the money.
NetApp failover does work reliably.
I'm not dumb enough to believe all of the claims that sales droids make. But, I'd like to hear some opinions as to what makes you buy Netapps, what keeps you on Netapps, and what will drive you away from Netapps.
I'll break the answer into several sections.
What I like about NetApps:
- They're simple to configure and administer. - They allow you to set soft partitions (quotas) to section up large volumes. - You can resize soft partitions up or down as you please. - You can grow "hard" volumes by simply adding disks. - The filesystem design is simple and robust. - The hardware design is clean and uniform (compare this to Auspex which is a mess composed of UltraSparcs, Pentium IIs, and 386s). - Clustering is simple and it works. - Upgrading the OS is a breeze (one simple reboot). - There is a ton of documentation available on-line explaining different implementational ideas for various environments as well as explanations of architectural features of NetApps.
What I dislike about NetApps (Some of these might have been fixed in 5.3.x):
- Lack of several - most are there - simple commands to test whether everything outside works (there are some hidden NIS commands, but they don't work anymore). - The inability to change passwords via rsh or a more secure yet "automatable" manner - The inability to kick off a user from a telnet session with ease. - Lack of a rudimentary editor to edit configuration files so that an administrative host is largely unnecessary (a simple vi will do). - The inability to add quotas without stopping and starting the quota system (You can modify sizes of existing quotas, but cannot add quota restrictions without stopping and starting the quota system). - The lack of NTP daemon to synchronize the system with the rest of the network. - The lack of round robin Ethernet trunking support (Both of our switch vendors Cisco and Cabletron support it. ***This is the largest NetApp performance issue for us.*** - Poor quality cable connectors. They are functionally great, but the thumbscrews need some help. In the past, the rubber molding on the screws would be stripped when trying to gently tigten them with a small screwdriver. On my newest 760s I broke three screws because of a faulty tap nut. The nut didn't allow the screw to come flush with the connector and the screws themself were so brittle that they broke off. I LOVE the new screw design, but they MUST be made from a more durable material. - Tolerances on the disk shelves could be better. The golden material used as an isolator between the disks and the case should be attached more uniformly. Sometimes it is difficult to insert a disk without it becomming caught on the said isolator. Also, there is a slim piece of plastic protruding from the bottom and against the side of the shelf. I found that piece of plastic to often be bent down or completely broken off. When bent down the strip prevents the drive from being inserted. In my humble opinion this strip should be removed altogether. It is to thin and too flexible to stay up to spec.
Problems I have with Technical Support:
- Appearance of no clear and methodical path for troubleshooting problems on site. I've had this experience once. Two technicians appeared in my computer room after repetitive nagging for them to show up and take a look at my filer which was spontaneously rebooting. Initially they appeared to be as stunned as I was. After some time, they proceeded to test the system drive shelf by drive shelf. I should mention that this Filer was a part of a cluster and in order to test the drive shelves had to be disconnected from the clustering partner and thus interrupt my users' access to data. Had they not troubleshot the box the data would have been accessible throughout the day. Because of troubleshooting the data were inaccessible during business hours. In retrospect I regret letting them in when I did. - The Tech Support lines have ungodly waiting times (before with annoying music). I doubt that I am exaggerating when I say that I waited for half an hour last time I called Tech Support. I'm sure NetApp has a better measure of how much time I waited for anyone to answer during business hours. I'd like to have it so I can quote it exactly. - The Tech Support bumps you from tech to tech over several days and poorly communicates their discoveries, i.e. they rarely communicate anything and when they do I can see my grandchildren watch their granchildren graduate with PhDs. - The Tech Support sends replacement parts with no explanation. I received parts from NAC on several occasions very timely, but often unnecessarily and with no communication. These parts did not come as a result of requesting such parts, but as outcome of troubleshooting the problem based on submitted data without discussion with the customer. You submit a ticket describing a problem and the parts simply appear. I call NAC asking them why I should replace the parts only to hear that what I described is not a hardware problem. Several weeks later I receive reassurance that it is in fact an exotic bug that is fixed in the recent OS release. - The Tech Support doesn't keep a good track of materials they receive from the customers. It seems like every time I call to find out about my tickets I have to resubmit autosupport and potential cores. Can't they just take one of the boxes from the production lines, dump all that data into it, and then reference the location in the ticket database? I'm sick of uploading up to 80MB core files again and again and again.
Regardless of the above harsh criticism, I think that the NACs are overall very good. They're stable of themselves and clustering works if you need to perform any maintenance on the head unit itself. The technical support organization needs work, but I think they realize it and are moving in a good direction. My gut feeling is that they're becomming more expert on their product, but you still have to wait 30 minutes to get through to them. I'd recommend them over Auspex for their simple and robust OS, clearcut hardware design, ease of use, and speed of cleanup after a crash. With NetApps I had no cleanups; yank a cord, plug it back in, and it continues chuging within 2 minutes. Try that with an Auspex. Similar is true with many directly attached devices.
Tom