Your NFS client most likely has an entry in the crontab to delete all files in the /tmp directory (or all files older than a certain amount of time); this script is probably not smart enough to tell files on local storage from remote storage.
Two things jumped out at me immediately:
1) You're mounting the filer on /tmp. 2) The deletions occur at 5:00am Saturday, which is prime cron time.
Note also the ascending time of the directories -- as the file-removing script swept through the directories, it updated their modification times.
The moral: NEVER, EVER, EVER NFS mount anything on or in /tmp. :)
-- Eric Barrett / Technical Support Engineer, Network Appliance Direct: 1-408-822-4779 / Pager: 1-408-939-7945 Get answers NOW! - NetApp On the Web - http://now.netapp.com
-----Original Message----- From: Dave Toal [mailto:dave_toal@t-t.com] Sent: Monday, December 20, 1999 4:11 PM To: jq@opensystems.com; toasters@mathworks.com; unixgroup@t-t.com; support@opensystems.com Subject: Files disappear after some random interval
Folks,
We have a problem with files disappearing from a filer.
The box is a 520 running 5.2.1. THREE times now, we've mounted one of its volumes to a production box, done tar cvf - | ( cd mount_point; tar xvf -), diff -r with no errors -- and then a day or two later the files that were copied are gone. Directories are there still, and symbolic links, but NO FILES.
Snapshots were turned off after the second time it
happened; snap reserve is set to 0% and snap sched is 0 0 0 for that volume. There are no remarks in messages -- nothing besides [statd] time, up x days. The directories show size but ls returns nothing.
This is the volume on the netapp, mounted on the sun box:
idsmajor:root:/mnt/etc>mount -p | grep /tmp/a sanihome:/vol/nfs_archive - /tmp/a nfs - no rw idsmajor:root:/mnt/etc>ls -l /tmp/a | head total 81288 drwxr-xr-x 2 ids users 344064 Dec 19 05:29 apr_1995 drwxr-xr-x 2 ids users 393216 Dec 19 05:31 apr_1996 drwxr-xr-x 2 ids users 3989504 Dec 19 05:51 apr_1997 drwxr-xr-x 2 ids users 114688 Dec 19 05:52 aug_1994 drwxr-xr-x 2 ids users 380928 Dec 19 05:53 aug_1995 drwxr-xr-x 2 ids users 409600 Dec 19 05:55 aug_1996 drwxr-xr-x 2 ids users 4227072 Dec 19 06:16 aug_1997 drwxr-xr-x 2 ids users 282624 Dec 19 06:18 dec_1994 drwxr-xr-x 2 ids users 290816 Dec 19 06:19 dec_1995 idsmajor:root:/mnt/etc>ls -als /tmp/a/* | head -20 0 lrwxrwxrwx 1 ids users 29 Dec 7 16:34 /tmp/a/dec_1997 -> /archive/nfs_archive/dec_1997 0 lrwxrwxrwx 1 ids users 29 Dec 7 23:37 /tmp/a/nov_1997 -> /archive/nfs_archive/nov_1997 0 lrwxrwxrwx 1 ids users 29 Dec 8 01:05 /tmp/a/oct_1997 -> /archive/nfs_archive/oct_1997 0 lrwxrwxrwx 1 ids users 29 Dec 8 01:54 /tmp/a/sep_1997 -> /archive/nfs_archive/sep_1997
/tmp/a/apr_1995: total 688 680 drwxr-xr-x 2 ids users 344064 Dec 19 05:29 . 8 drwxr-xr-x 44 ids users 4096 Dec 20 13:33 ..
/tmp/a/apr_1996: total 784 776 drwxr-xr-x 2 ids users 393216 Dec 19 05:31 . 8 drwxr-xr-x 44 ids users 4096 Dec 20 13:33 ..
/tmp/a/apr_1997: total 7808 7800 drwxr-xr-x 2 ids users 3989504 Dec 19 05:51 . 8 drwxr-xr-x 44 ids users 4096 Dec 20 13:33 ..
idsmajor:root:/mnt/etc>
There were no problems with applications pulling files
from these directories on the netapp on Friday, Dec 18. This volume is not mounted anywhere else. This is a production system -- nothing writes to the mount point.
What happened Saturday morning at 5 AM? Why did all
these directories get touched?
Dave
Eric,
Um... solaris 2.6...
D'OH!!!
<shakes head> I bet the filesystem was left mounted on /tmp after the copies were done.
Thanks very much. I may have to give back my Solaris merit badge for this one... :-/
Dave
Barrett,Eric wrote:
Your NFS client most likely has an entry in the crontab to delete all files in the /tmp directory (or all files older than a certain amount of time); this script is probably not smart enough to tell files on local storage from remote storage.
Two things jumped out at me immediately:
- You're mounting the filer on /tmp.
- The deletions occur at 5:00am Saturday, which is prime cron time.
Note also the ascending time of the directories -- as the file-removing script swept through the directories, it updated their modification times.
The moral: NEVER, EVER, EVER NFS mount anything on or in /tmp. :)
-- Eric Barrett / Technical Support Engineer, Network Appliance Direct: 1-408-822-4779 / Pager: 1-408-939-7945 Get answers NOW! - NetApp On the Web - http://now.netapp.com
-----Original Message----- From: Dave Toal [mailto:dave_toal@t-t.com] Sent: Monday, December 20, 1999 4:11 PM To: jq@opensystems.com; toasters@mathworks.com; unixgroup@t-t.com; support@opensystems.com Subject: Files disappear after some random interval
Folks,
We have a problem with files disappearing from a filer.
The box is a 520 running 5.2.1. THREE times now, we've mounted one of its volumes to a production box, done tar cvf - | ( cd mount_point; tar xvf -), diff -r with no errors -- and then a day or two later the files that were copied are gone. Directories are there still, and symbolic links, but NO FILES.
Snapshots were turned off after the second time it
happened; snap reserve is set to 0% and snap sched is 0 0 0 for that volume. There are no remarks in messages -- nothing besides [statd] time, up x days. The directories show size but ls returns nothing.
This is the volume on the netapp, mounted on the sun box:
idsmajor:root:/mnt/etc>mount -p | grep /tmp/a sanihome:/vol/nfs_archive - /tmp/a nfs - no rw idsmajor:root:/mnt/etc>ls -l /tmp/a | head total 81288 drwxr-xr-x 2 ids users 344064 Dec 19 05:29 apr_1995 drwxr-xr-x 2 ids users 393216 Dec 19 05:31 apr_1996 drwxr-xr-x 2 ids users 3989504 Dec 19 05:51 apr_1997 drwxr-xr-x 2 ids users 114688 Dec 19 05:52 aug_1994 drwxr-xr-x 2 ids users 380928 Dec 19 05:53 aug_1995 drwxr-xr-x 2 ids users 409600 Dec 19 05:55 aug_1996 drwxr-xr-x 2 ids users 4227072 Dec 19 06:16 aug_1997 drwxr-xr-x 2 ids users 282624 Dec 19 06:18 dec_1994 drwxr-xr-x 2 ids users 290816 Dec 19 06:19 dec_1995 idsmajor:root:/mnt/etc>ls -als /tmp/a/* | head -20 0 lrwxrwxrwx 1 ids users 29 Dec 7 16:34 /tmp/a/dec_1997 -> /archive/nfs_archive/dec_1997 0 lrwxrwxrwx 1 ids users 29 Dec 7 23:37 /tmp/a/nov_1997 -> /archive/nfs_archive/nov_1997 0 lrwxrwxrwx 1 ids users 29 Dec 8 01:05 /tmp/a/oct_1997 -> /archive/nfs_archive/oct_1997 0 lrwxrwxrwx 1 ids users 29 Dec 8 01:54 /tmp/a/sep_1997 -> /archive/nfs_archive/sep_1997
/tmp/a/apr_1995: total 688 680 drwxr-xr-x 2 ids users 344064 Dec 19 05:29 . 8 drwxr-xr-x 44 ids users 4096 Dec 20 13:33 ..
/tmp/a/apr_1996: total 784 776 drwxr-xr-x 2 ids users 393216 Dec 19 05:31 . 8 drwxr-xr-x 44 ids users 4096 Dec 20 13:33 ..
/tmp/a/apr_1997: total 7808 7800 drwxr-xr-x 2 ids users 3989504 Dec 19 05:51 . 8 drwxr-xr-x 44 ids users 4096 Dec 20 13:33 ..
idsmajor:root:/mnt/etc>
There were no problems with applications pulling files
from these directories on the netapp on Friday, Dec 18. This volume is not mounted anywhere else. This is a production system -- nothing writes to the mount point.
What happened Saturday morning at 5 AM? Why did all
these directories get touched?
Dave
On Dec 20, Dave Toal wrote:
Thanks very much. I may have to give back my Solaris merit badge for this one... :-/
Don't feel bad. We did this to one of our engineering file servers a while back.
Ouch!
(And thank God for snapshots.)
Dave
One of the top rules of system administration I learned early on is to always try the simple explanation first, even if you're sure that can't possibly be it. Much of the time the OS/applications are running fine, and the problem is simple rather than some obscure, complex failure condition. How many times have you wasted an hour debugging a TCP/IP problem before you realize you really DO have the IP address wrong, or had DNS give you error messages that you just didn't believe at face value and spent time looking elsewhere for the problem? Always check the simple stuff first!
Bruce