Has anyone experienced any of the following errors on their F330 (or any other) filer running Release 4.0.1c?
1.) Unexplained reboots 2.) portmapper errors-> [portmapper]: portmap: prog (100004,2,2) not registered 3.) rsh hangs -> "system hung (rshd_0)!" 4.) machine check -> MACHINE CHECK: addr 0037470c,mctd,PCMC ERRSTS 8 5.) crash -> "trap 13 code=0 eip=20625f cs=8 eflangs=10202"
Any help would be appreciated.
Has anyone experienced any of the following errors on their F330 (or any other) filer running Release 4.0.1c?
It's probably appropriate to address all of these questions to NetApp Support... they can help with this.
1.) Unexplained reboots
In my experience these are usually caused by the watchdog timer resetting the system in the event of a fatal error. This is much better than hanging the system forever. If you download the latest firmware revision from now.netapp.com, you will get confirmation of a watchdog event in the syslog messages.
2.) portmapper errors-> [portmapper]: portmap: prog (100004,2,2) not registered
This is caused by other hosts on the network attempting to contact a service on the filer that it doesn't have. The above example is quite common because it's a broadcast for an NIS server. However, other program numbers can be seen (like proprietary ports for various services) if they try to talk to the filer.
I think setting your /etc/syslog.conf to not log at debug level will suppress these messages.
3.) rsh hangs -> "system hung (rshd_0)!"
This sounds like a definite bug... NetApp Support can analyze the core file.
4.) machine check -> MACHINE CHECK: addr 0037470c,mctd,PCMC ERRSTS 8
I *think* these are usually memory parity errors, but again Support would be able to tell you for sure. In this case, the server reboots rather than corrupting your data... this is a good thing! If it happens rarely it should not be an issue; if it's happening too often you may need a memory or motherboard replacement. For more reliability you could consider an F540, F520, or F630... they have ECC correction on single-bit errors.
5.) crash -> "trap 13 code=0 eip=20625f cs=8 eflangs=10202"
The crash from 5 is also something Support should analyze. Looks like a seg fault.
Bruce