Some of you may recall Matt Phelps encountering some trouble with NDMP incremental backups.
We've identified the problem as bug 23356. The bug description will soon be available ONLINE at NOW.
In a nutshell, the problem is this: An NDMP incremental dump will not include files with an mtime which is set to a time BEFORE the timestamp of the last level 0 dump, but created AFTER the timestamp of the last level 0 dump.
Thus, your NDMP incremental backup may be missing some files that were created after the level 0 backup.
The files that will be affected would primarily be: 1) Files that are restored, untar'ed through NFS, CIFS, or the native filer restore or NDMP restore. 2) Files that have has changed permissions, without the actual file changing (we won't back up the changed permissions).
Note that the next level 0 backup WOULD include all of these files.
The history of this bug:
Some customers were encountering problems with various Windows applications using the Win32 API.
Apparently, the ctime of an accessed file is updated by a large number of Windows apps (even if the file is only read) because the apps don't understand "last changed time".
Thus, users with virus scanners would find that their incremental dumps included all files in the system...
So we added a hidden option to NDMP and dump to ignore_ctime. (see online burt 17767 for more details on this issue)
Unfortunately, all incremental NDMP backups incorrectly invoke that option as the default, and it cannot be overridden.
The solution:
We will soon have a 5.3.6 patch available that fixes this problem.
This is also a good time to mention the NOW Bug Watcher utility. When we do a patch release that fixes this bug, those watching the bug will be automatically notified. (I'm guessing it probably works for other bugs, too. ;)
http://now.netapp.com/cgi-bin/relfind?bugs=23356&rels=
And thanks to Matt for being so helpful with this issue.
Stephen Manley NDMP Spokesmodel
Some of you may recall Matt Phelps encountering some trouble with NDMP incremental backups.
We've identified the problem as bug 23356. The bug description will soon be available ONLINE at NOW.
In a nutshell, the problem is this: An NDMP incremental dump will not include files with an mtime which is set to a time BEFORE the timestamp of the last level 0 dump, but created AFTER the timestamp of the last level 0 dump.
Thus, your NDMP incremental backup may be missing some files that were created after the level 0 backup.
The files that will be affected would primarily be:
- Files that are restored, untar'ed through NFS, CIFS, or the native
filer restore or NDMP restore. 2) Files that have has changed permissions, without the actual file changing (we won't back up the changed permissions).
Note that the next level 0 backup WOULD include all of these files.
As I understand it, here are the unix rules that govern the unix atime, mtime, and ctime. I don't know how CIFS fits in to this scheme, but this is probably how NFS works:
atime - time data was last read from the file (ls -lu)
mtime - time data was last written to the file (ls -l)
ctime - time inode was last changed (with one exception, ls -lc)
When a file is first created, its atime, mtime and ctime are all set to the current time.
If you write data to the file, the mtime is updated. Since the mtime is stored in the inode, the ctime is also updated. The atime is NOT updated, so it is possible for mtime and ctime to be more recent than atime.
If you read data from the file, or execute a file, the atime is updated. Although the atime is in the inode, unix does not update the ctime in this case. This prevents dump programs from needlessly dumping a file that was read, but nothing else.
In the following circumstances, only the ctime is changed. The atime and mtime are unchanged. This is because data is neither read from the file nor written to the file:
o Changing owner, group, or permissions (all stored in inode).
o Renaming or moving a file with mv.
o adding or removing a hard link to a file with ln (link count stored in inode).
Even though these operations do not change any file data, they certainly should cause the file to be backed up again. So most unix dump programs include a file in an incremental dump if either mtime or ctime is recent enough.
Unix allows you to manually set the atime and mtime to an arbitrary value (see the "touch" command). tar, cpio, and restore use this feature when restoring a file. This causes the ctime to be updated to the current time because this is a change to the inode. Thus, when you un-tar files, if you "ls -lc" them, you'll see the time that the files were extracted from the tar file.
There is no way in unix to manually set ctime. Superuser can only do it indirectly, such as resetting the system clock, and then doing a chmod, and then setting the system clock back (not recommended!). So for all intents and purposes, nonprivileged users cannot set ctime to an arbitrary value.
The history of this bug:
Some customers were encountering problems with various Windows applications using the Win32 API.
Apparently, the ctime of an accessed file is updated by a large number of Windows apps (even if the file is only read) because the apps don't understand "last changed time".
Thus, users with virus scanners would find that their incremental dumps included all files in the system...
This sounds like a similar problem with the unix "file" command on some systems. The "file" command tells you what type of file you have. Usually "file" has to read the first few bytes of the file to figure it out. This, of course, trips the atime. But then "file" tries to cover its tracks by setting the atime back to its original value. Unfortunately, this trips the ctime, which also causes a needless incremental dump of the file. So are you saying that virus scanners also save the atime and set it back? Makes sense because otherwise the atime of all the files on the system are reset whenever a virus scan is run. But resetting the atime trips the ctime, causing all files to be included in an incremental dump.
But there are important reasons to dump files whose ctimes have changed, so ignoring ctime isn't the answer.
So is the fix going to be to allow CIFS (and perhaps NFS) to reset the atime without tripping ctime? While this bends the unix rules a bit, I can't see a problem with it. I guess if this behavior is required by a standard, then netapp would not want to go out of compliance.
A possible workaround would be to have the virus scanner check a snapshot. Then the virus scanner can't change any timestamps at all since the snapshot is strictly read only. Of course, the virus scanner might croak when its attempt to reset atimes in a snapshot fails.
So we added a hidden option to NDMP and dump to ignore_ctime. (see online burt 17767 for more details on this issue)
Unfortunately, all incremental NDMP backups incorrectly invoke that option as the default, and it cannot be overridden.
The solution:
We will soon have a 5.3.6 patch available that fixes this problem.
This is also a good time to mention the NOW Bug Watcher utility. When we do a patch release that fixes this bug, those watching the bug will be automatically notified. (I'm guessing it probably works for other bugs, too. ;)
http://now.netapp.com/cgi-bin/relfind?bugs=23356&rels=
And thanks to Matt for being so helpful with this issue.
Stephen Manley NDMP Spokesmodel
Steve Losen scl@virginia.edu phone: 804-924-0640
University of Virginia ITC Unix Support
The history of this bug:
Some customers were encountering problems with various Windows applications using the Win32 API.
Apparently, the ctime of an accessed file is updated by a large number of Windows apps (even if the file is only read) because the apps don't understand "last changed time".
Thus, users with virus scanners would find that their incremental dumps included all files in the system...
This sounds like a similar problem with the unix "file" command on some systems. The "file" command tells you what type of file you have. Usually "file" has to read the first few bytes of the file to figure it out. This, of course, trips the atime. But then "file" tries to cover its tracks by setting the atime back to its original value. Unfortunately, this trips the ctime, which also causes a needless incremental dump of the file. So are you saying that virus scanners also save the atime and set it back? Makes sense because otherwise the atime of all the files on the system are reset whenever a virus scan is run. But resetting the atime trips the ctime, causing all files to be included in an incremental dump.
But there are important reasons to dump files whose ctimes have changed, so ignoring ctime isn't the answer.
Agreed. That's why the workaround of "ignore_ctime" is not a standard documented option, and should not be used by most NetApp customers.
Those who have used it understand the ramifications of their decision.
Longer-term solutions to the problem are mentioned in the bug description for burt 17767. One possibility is to allow CIFS users to choose to use the "Archive bit" semantic.
The archive bit is the Windows way of not worrying about any of the mtime/ctime business. I believe it is cleared when the file is backed up, and set when the file is modified...
Priority 1 was to fix burt 23356, so that those of you who aren't worried about burt 17767 are not affected by it.
We are working on the "right" answer for 17767.
Stephen Manley NDMP Time Keeper
P.S.
A possible workaround would be to have the virus scanner check a snapshot. Then the virus scanner can't change any timestamps at all since the snapshot is strictly read only. Of course, the virus scanner might croak when its attempt to reset atimes in a snapshot fails.
BTW -- this was a suggested approach for the virus scanners. I don't have data on the success rate of this choice.