Any advice on what we should really include in our eval to really test the box out?
I'd try to mirror/simulate as closely as possible whatever workload you expect to place on your filers when you put them into production.
Don't forget to eval whatever backup/restore sw you are looking at at the same time.
You might want to test out NA customer support. Put you filer into some situations that you need to call support for and see if support can provide you the assistance you need.
"Brian" == Brian Tao taob@risc.org writes: Brian> Try the extreme cases like very large directories (10k's or 100k's Brian> of files), very deep directory hierarchies, large files Brian> (2GB+) and intense file locking activity (something that Brian> always sucks over NFS).
It's neat to see how a filer performs under those conditions, but I expect you have a pretty good idea of the workload you plan to place on your filers. You may know that you aren't ever going to have to deal with 3GB files or 100k entry directories.
So, I'd definitly try some extreme cases that result from events beyond your control (i.e., hw failures):
Try pulling a disk and testing out RAID reconstuct. Turn the filer off in degraded mode and see if it comes back okay. If you are thinking about clustering, get a clustered pair on evel. Exercise the clustering. Try every scenarir you can think of to force a failover (e.g, turn a filer off, pull a filer's fan, break a filer's FC-AL A-loop).
Try pulling a filer's disk, then while it is doing a reconstruct, turn it off to force a takeover. See if the partner does a proper takeover and begins a reconstuct. Pull a disk on the opposite filer so that it is doing two reconstructs at once.
Try addind a shelf to a clustered pair w/o having both filers down. This is supposed to work and is a documented procedure from NA.
Just for kicks, try out some catastrophic things so you can see what they look like. Pull two disks from the same RAID group. Try turning off a shelf.
It has been my experience that NA's are great once you get them up and running if you don't touch them. If you do _anything_ out of the norm (i.e, any h/w maintenence procedure, or try to utilize any recently introduced feature, such as SnapMirror), you have a 50/50 chance of exercising some bug. Our filers have actually been _less_ reliable since we clustered them. We've had two extended downtimes (> 1 hour) when trying to take advantage of the clustering in order to perform a zero-downtime maintenence. I'm not sure I ever want to type 'cf takeover' on my filers again.
Before we clustered our F740's, we had an F540 and and F630 that never went down. We then ran an F740 for over a year with no trouble. When we finally got a pair of F740's and clustered them, we started having problems.
Good luck. They are truly wonderful devices when they work.
j. -- Jay Soffian jay@cimedia.com UNIX Systems Engineer Cox Interactive Media