New subject: Restore problems [was: Restoring NetApp dumps with Solaris ufsrestore]

29 Aug 2002


      Following some very nasty experiences with Veritas (vxvm & vxfs)
on one of our systems, we are currently in paranoia mode about
our ability to restore from backups, and have been doing more
than usually extensive tests.
I have come across a serious problem with ONTAP restore, but I am
in communication with NetApp about that and it's not the subject
of this message. Maybe later on ... :-)
NetApp have said on a number of occasions that it's a design feature
of their dump format that it is a compatible extension of good old
BSD dump format, so that if all else fails one can feed them to a
BSD-type restore program (losing ACLs and such-like info, of course).
Solaris "ufsrestore" is usually explicitly or implicitly mentioned.
So I have been testing that, and have fallen over a problem that
I knew about (at least since May 2000, as I see I mentioned it in
passing on toasters then) but have been ignoring. Maybe it's time
to do something about it! What happens is that ufsrestore says
write error extracting inode NNNNN, name ./path/name/to/file
 write: Bad address
and gives up. One can see that it has half-written the file involved.
It seems to happen only on files that have holes (often many of them)
at odd multiples of 4K: in our case they are usually *.pag files that
are part of (n)dbm databases.
Now I very strongly suspect that this is a bug in Solaris ufsrestore,
not in the NetApp dump contents, and I would like to be able to report
it to Sun and get it fixed. But it seems to be very difficult to
reproduce with a small example: it's far from the case that every
file with oddly-aligned holes causes a problem, or even that a file
with exactly the same hole pattern will provoke the bug if it occurs
at a different point in the dump.
Even worse, it apparently depends on what sort of filing system
ufsrestore is restoring into: I have never had it happen if it is
a local ufs filing system, but often if it was an nfs one (usually
on a NetApp filer, of course). If all these variables are reproduced,
though, the effect is repeatable.
It's possible that Sun would reject such a bug report unless one
could show that it failed on a dump generated by Solaris ufsdump.
Solaris ufs filing systems have always been blocked at 8K by default
(so that hole boundaries must be on multiples of 8K). On UltraSPARC
(sun4u) systems one can't even mount ones that are blocked at 4K any
longer. I tried making a ufsdump of a filing system blocked at 4K on
a SPARCstation 5 (sun4m) - sometimes it's useful to have such an out
of date machine as one's personal workstation! - with a suitably holey
file in it, but I couldn't get the ufsrestore bug to show up... :-(
If anyone else has ever come across this problem, and/or has any
suggestions on how to proceed with homing in on the bug, I would
very much like to hear from them. The latest experiments were done
with Solaris 8 ufsrestore as patched by 109091-05, but as I said
above I believe the bug has been there for many years.
Chris Thompson               University of Cambridge Computing Service,
Email: cet1@ucs.cam.ac.uk    New Museums Site, Cambridge CB2 3QH,
Phone: +44 1223 334715       United Kingdom.