Folks,
We have a problem with files disappearing from a filer. The box is a 520 running 5.2.1. THREE times now, we've mounted one of its volumes to a production box, done tar cvf - | ( cd mount_point; tar xvf -), diff -r with no errors -- and then a day or two later the files that were copied are gone. Directories are there still, and symbolic links, but NO FILES.
Snapshots were turned off after the second time it happened; snap reserve is set to 0% and snap sched is 0 0 0 for that volume. There are no remarks in messages -- nothing besides [statd] time, up x days. The directories show size but ls returns nothing.
This is the volume on the netapp, mounted on the sun box:
idsmajor:root:/mnt/etc>mount -p | grep /tmp/a sanihome:/vol/nfs_archive - /tmp/a nfs - no rw idsmajor:root:/mnt/etc>ls -l /tmp/a | head total 81288 drwxr-xr-x 2 ids users 344064 Dec 19 05:29 apr_1995 drwxr-xr-x 2 ids users 393216 Dec 19 05:31 apr_1996 drwxr-xr-x 2 ids users 3989504 Dec 19 05:51 apr_1997 drwxr-xr-x 2 ids users 114688 Dec 19 05:52 aug_1994 drwxr-xr-x 2 ids users 380928 Dec 19 05:53 aug_1995 drwxr-xr-x 2 ids users 409600 Dec 19 05:55 aug_1996 drwxr-xr-x 2 ids users 4227072 Dec 19 06:16 aug_1997 drwxr-xr-x 2 ids users 282624 Dec 19 06:18 dec_1994 drwxr-xr-x 2 ids users 290816 Dec 19 06:19 dec_1995 idsmajor:root:/mnt/etc>ls -als /tmp/a/* | head -20 0 lrwxrwxrwx 1 ids users 29 Dec 7 16:34 /tmp/a/dec_1997 -> /archive/nfs_archive/dec_1997 0 lrwxrwxrwx 1 ids users 29 Dec 7 23:37 /tmp/a/nov_1997 -> /archive/nfs_archive/nov_1997 0 lrwxrwxrwx 1 ids users 29 Dec 8 01:05 /tmp/a/oct_1997 -> /archive/nfs_archive/oct_1997 0 lrwxrwxrwx 1 ids users 29 Dec 8 01:54 /tmp/a/sep_1997 -> /archive/nfs_archive/sep_1997
/tmp/a/apr_1995: total 688 680 drwxr-xr-x 2 ids users 344064 Dec 19 05:29 . 8 drwxr-xr-x 44 ids users 4096 Dec 20 13:33 ..
/tmp/a/apr_1996: total 784 776 drwxr-xr-x 2 ids users 393216 Dec 19 05:31 . 8 drwxr-xr-x 44 ids users 4096 Dec 20 13:33 ..
/tmp/a/apr_1997: total 7808 7800 drwxr-xr-x 2 ids users 3989504 Dec 19 05:51 . 8 drwxr-xr-x 44 ids users 4096 Dec 20 13:33 ..
idsmajor:root:/mnt/etc>
There were no problems with applications pulling files from these directories on the netapp on Friday, Dec 18. This volume is not mounted anywhere else. This is a production system -- nothing writes to the mount point.
What happened Saturday morning at 5 AM? Why did all these directories get touched?
Dave
Quoting Dave Toal (dave_toal@t-t.com):
We have a problem with files disappearing from a filer.
...
idsmajor:root:/mnt/etc>mount -p | grep /tmp/a sanihome:/vol/nfs_archive - /tmp/a nfs - no rw
...
What operating systems are mounting the filer? Mounting filesystems that you care about in /tmp is generally a bad idea, as various OS flavors may try to randomly delete things since they view it as temporary space.
For instance, numerous Linux Red Hat distributions (4.1 among them, not sure about the newer ones) have shipped with the following in /etc/crontab:
# Remove /tmp, /var/tmp files not accessed in 10 days (240 hours) 41 02 * * * root /usr/sbin/tmpwatch 240 /tmp /var/tmp
Naturally, a great deal of damage can occur from this sort of script, depending on the modes of the files, export permissions on the NFS server, etc. If this is your problem, you wouldn't be the first! Snapshots save the day in this scenario, if they are not disabled....
Paul Eastham NetApp Engineering
* Dave Toal (dave_toal@t-t.com) done spit this rhetoric:
idsmajor:root:/mnt/etc>ls -als /tmp/a/* | head -20
There were no problems with applications pulling files from these directories on the netapp on
Friday, Dec 18. This volume is not mounted anywhere else. This is a production system -- nothing writes to the mount point.
I bet if you export the volume read only, the problem goes away ;)
What happened Saturday morning at 5 AM? Why did all these directories get touched?
If I had to take a WAG, I'd say that this was a RedHat Linux system, and that /etc/cron.daily/tmpwatch was blowing away all the files (it check for files not accessed in 10 days, and you probably have the no_atime_update option turned on).
If that's not it, then I'd look for other /tmp cleaning scripts.