To test the scsi reset / filer reboot theory:
I started a ndmp backup to a scsi attached tape / library. Once data was being written to tape, I disconnected the scsi cable. The backup aborted, and the scsi bus did reset. The filer (F760 ONTAP 5.3.5) did not reboot. The NFS and CIFS operations were not impacted.
Thanks, Bill Roth
This was about 4 months ago now. Let's see if I can remember the sequence of events accurately ...
F740 running OnTap 5.2.1, DLT4700 stacker direct SCSI attached, using BudTool 4.6. At 6:00 we had a backup scheduled to run. The tape was requested to load and but got jammed when it started to latch and spool the tape into the drive. BudTool continued to retry for approx 4 hours, the filer was reporting SCSI bus errors. At approx 10:00 the filer rebooted (on it's own) to try to clear the bus error. The error wasn't cleared. Around 14:00 we had a replacement stacker and swapped them hot. (Then we got to dismantle the stacker to try to get the tape out.)
At the time I can remember we were thinking 'Cool. The NetApp rebooted, resulting in 90s (or so) downtime and no one even noticed the interruption in service.' Not one complaint to our help desk. Would have been a different story if we were still using the Auspex.
So the circumstances are different than your test case, but OnTap 5.3.5 might be better at handling these sorts of errors.
For our new filer we want to avoid any reboots of this sort due to tape drive SCSI errors (for the existing filer it's less important). For performance reasons I would prefer to do direct SCSI attach of the tape drives, but on the way in today I was thinking we might be able to hang all the tape drives off our existing filer and backup up the new filer that way (once I figure out the security implications).