We've just tracked down an evil problem where corrupt data is being returned on NFS reads from a NetApp.
This turned out to be because of dodgy network cables introducing packet corruption - word-aligned values of 0x0000 and 0xFFFF apparently produce the same TCP checksum result, so any corruption which interchanges those values get through to the application. Our application has a file which is a bitmap, and contains a lot of this kind of data - so it's just the sort of file you'd expect this problem to occur on.
Obviously now that we know about this there's stuff to do - monitor CRC errors more carefully, perhaps change the structure of our data. But I was wondering whether anyone else has experienced similar problems and come up with any imaginative solutions to it?
Regards,
Edward Hibbert Internet Applications Group Data Connection Ltd Tel: +44 131 662 1212 Fax: +44 131 662 1345 Email: eh@dataconnection.com Web: http://www.dataconnection.com