Following some very nasty experiences with Veritas (vxvm & vxfs) on one of our systems, we are currently in paranoia mode about our ability to restore from backups, and have been doing more than usually extensive tests.
I have come across a serious problem with ONTAP restore, but I am in communication with NetApp about that and it's not the subject of this message. Maybe later on ... :-)
NetApp have said on a number of occasions that it's a design feature of their dump format that it is a compatible extension of good old BSD dump format, so that if all else fails one can feed them to a BSD-type restore program (losing ACLs and such-like info, of course). Solaris "ufsrestore" is usually explicitly or implicitly mentioned.
So I have been testing that, and have fallen over a problem that I knew about (at least since May 2000, as I see I mentioned it in passing on toasters then) but have been ignoring. Maybe it's time to do something about it! What happens is that ufsrestore says
write error extracting inode NNNNN, name ./path/name/to/file write: Bad address
and gives up. One can see that it has half-written the file involved. It seems to happen only on files that have holes (often many of them) at odd multiples of 4K: in our case they are usually *.pag files that are part of (n)dbm databases.
Now I very strongly suspect that this is a bug in Solaris ufsrestore, not in the NetApp dump contents, and I would like to be able to report it to Sun and get it fixed. But it seems to be very difficult to reproduce with a small example: it's far from the case that every file with oddly-aligned holes causes a problem, or even that a file with exactly the same hole pattern will provoke the bug if it occurs at a different point in the dump.
Even worse, it apparently depends on what sort of filing system ufsrestore is restoring into: I have never had it happen if it is a local ufs filing system, but often if it was an nfs one (usually on a NetApp filer, of course). If all these variables are reproduced, though, the effect is repeatable.
It's possible that Sun would reject such a bug report unless one could show that it failed on a dump generated by Solaris ufsdump. Solaris ufs filing systems have always been blocked at 8K by default (so that hole boundaries must be on multiples of 8K). On UltraSPARC (sun4u) systems one can't even mount ones that are blocked at 4K any longer. I tried making a ufsdump of a filing system blocked at 4K on a SPARCstation 5 (sun4m) - sometimes it's useful to have such an out of date machine as one's personal workstation! - with a suitably holey file in it, but I couldn't get the ufsrestore bug to show up... :-(
If anyone else has ever come across this problem, and/or has any suggestions on how to proceed with homing in on the bug, I would very much like to hear from them. The latest experiments were done with Solaris 8 ufsrestore as patched by 109091-05, but as I said above I believe the bug has been there for many years.
Chris Thompson University of Cambridge Computing Service, Email: cet1@ucs.cam.ac.uk New Museums Site, Cambridge CB2 3QH, Phone: +44 1223 334715 United Kingdom.