I have four filers that are nearing production status, and the hardware and software configurations were frozen this morning. To start everything fresh, I zeroed all the disks and recreated the filesystem on all four filers, untarred the 4.2a distribution, restored the rc, serialnum, hosts and hosts.equiv files, downloaded the boot code to the drives, and rebooted.
Once everyone was back online, I noticed there is a discrepancy between 'df' and 'du' outputs. 'df' says 40MB are used on two of the filers, and 4.5MB on the other two. 'du' from the admin host consistently reports about 4MB used. Where is that space on adm1-na2 and adm1-na3 going? Old snapshots (which I've all deleted)? The only files ever to have lived on these filesystems are the OS-related ones in /etc. All option settings are the same, maxfiles is set to the default, etc.
|| admin# df -k /na1 /na2 /na3 /na4 || Filesystem kbytes used avail capacity Mounted on || adm1-na1:/ 33007568 4520 33003048 1% /na1 || adm1-na2:/ 33007568 40192 32967376 1% /na2 || adm1-na3:/ 33007568 40184 32967384 1% /na3 || adm1-na4:/ 33007568 4408 33003160 1% /na4 || || admin# du -sk /na1 /na2 /na3 /na4 || 4048 /na1 || 4048 /na2 || 4048 /na3 || 4048 /na4
|| admin# rsh adm1-na1 df || Filesystem kbytes used avail capacity Mounted on || / 33007568 4520 33003048 0% / || /.snapshot 0 0 0 ---% /.snapshot || || admin# rsh adm1-na2 df || Filesystem kbytes used avail capacity Mounted on || / 33007568 40192 32967376 0% / || /.snapshot 0 0 0 ---% /.snapshot || || admin# rsh adm1-na3 df || Filesystem kbytes used avail capacity Mounted on || / 33007568 40180 32967388 0% / || /.snapshot 0 0 0 ---% /.snapshot || || admin# rsh adm1-na4 df || Filesystem kbytes used avail capacity Mounted on || / 33007568 4404 33003164 0% / || /.snapshot 0 0 0 ---% /.snapshot
|| admin# rsh adm1-na1 snap list || working......... %/used %/total date name || .. || ---------- ---------- ------------ -------- || || admin# rsh adm1-na2 snap list || working.......... %/used %/total date name || . || ---------- ---------- ------------ -------- || || admin# rsh adm1-na3 snap list || working.......... %/used %/total date name || . || ---------- ---------- ------------ -------- || || admin# rsh adm1-na4 snap list || working........... %/used %/total date name || || ---------- ---------- ------------ -------- ||
My guess:
Discrepency between du and df on an empty filesystem. I suspect that it is the meta-data files. There are invisible files that hold information about the filesystem. These meta-data files are an important part of the WAFL filesystem. "df" might report space used by these files, while "du" probably won't.
Note, if the filesystem is "brand new" in the sense that a new filesystem was created with a floppy boot, then "df" will probably show less space than it would if the filesystem was simply deleted. This is because we fill in holes in the meta-data files as the filesystem fills up. In a new filesystem, these files are almost entirely holes. In a "removed" filesystem that used to be almost full, or held a lot of inodes or blocks, the space allocated to the meta-data files is probably still allocated.
Hope this helps.
Ken.
Brian Tao wrote:
I have four filers that are nearing production status, and the
hardware and software configurations were frozen this morning. To start everything fresh, I zeroed all the disks and recreated the filesystem on all four filers, untarred the 4.2a distribution, restored the rc, serialnum, hosts and hosts.equiv files, downloaded the boot code to the drives, and rebooted.
Once everyone was back online, I noticed there is a discrepancy
between 'df' and 'du' outputs. 'df' says 40MB are used on two of the filers, and 4.5MB on the other two. 'du' from the admin host consistently reports about 4MB used. Where is that space on adm1-na2 and adm1-na3 going? Old snapshots (which I've all deleted)? The only files ever to have lived on these filesystems are the OS-related ones in /etc. All option settings are the same, maxfiles is set to the default, etc.
|| admin# df -k /na1 /na2 /na3 /na4 || Filesystem kbytes used avail capacity Mounted on || adm1-na1:/ 33007568 4520 33003048 1% /na1 || adm1-na2:/ 33007568 40192 32967376 1% /na2 || adm1-na3:/ 33007568 40184 32967384 1% /na3 || adm1-na4:/ 33007568 4408 33003160 1% /na4 || || admin# du -sk /na1 /na2 /na3 /na4 || 4048 /na1 || 4048 /na2 || 4048 /na3 || 4048 /na4
|| admin# rsh adm1-na1 df || Filesystem kbytes used avail capacity Mounted on || / 33007568 4520 33003048 0% / || /.snapshot 0 0 0 ---% /.snapshot || || admin# rsh adm1-na2 df || Filesystem kbytes used avail capacity Mounted on || / 33007568 40192 32967376 0% / || /.snapshot 0 0 0 ---% /.snapshot || || admin# rsh adm1-na3 df || Filesystem kbytes used avail capacity Mounted on || / 33007568 40180 32967388 0% / || /.snapshot 0 0 0 ---% /.snapshot || || admin# rsh adm1-na4 df || Filesystem kbytes used avail capacity Mounted on || / 33007568 4404 33003164 0% / || /.snapshot 0 0 0 ---% /.snapshot
|| admin# rsh adm1-na1 snap list || working......... %/used %/total date name || .. || ---------- ---------- ------------ -------- || || admin# rsh adm1-na2 snap list || working.......... %/used %/total date name || . || ---------- ---------- ------------ -------- || || admin# rsh adm1-na3 snap list || working.......... %/used %/total date name || . || ---------- ---------- ------------ -------- || || admin# rsh adm1-na4 snap list || working........... %/used %/total date name || || ---------- ---------- ------------ -------- ||
-- Brian Tao (BT300, taob@netcom.ca) "Though this be madness, yet there is method in't"
On Thu, 20 Nov 1997, Kenneth Whittaker wrote:
Discrepency between du and df on an empty filesystem. I suspect that it is the meta-data files. There are invisible files that hold information about the filesystem.
New information: I bumped up the maxfiles on all four filers to 2.5 million inodes. The adm1-na1 and adm1-na4 both printed the warning about having more than twice as many files as needed, but adm1-na2 and adm1-na3 did not. However, after all four were set to 2.5 million files, the df did not change (trying after umounting and remounting the filesystems, rebooting the filers, etc.).
As I said previously, maxfiles on the filers should have been identical, assuming that previous maxfiles settings are lost when the filesystem is wiped clean and recreated. I'm going to wipe everything out again after the latest rounds of tests, and I'll try to reproduce this condition.
|| admin# df -k /na1 /na2 /na3 /na4 || Filesystem kbytes used avail capacity Mounted on || adm1-na1:/ 33007568 4520 33003048 1% /na1 || adm1-na2:/ 33007568 40192 32967376 1% /na2 || adm1-na3:/ 33007568 40184 32967384 1% /na3 || adm1-na4:/ 33007568 4408 33003160 1% /na4
On Thu, 20 Nov 1997, Kenneth Whittaker wrote:
Discrepency between du and df on an empty filesystem. I suspect that it is the meta-data files. There are invisible files that hold information about the filesystem.
New information: I bumped up the maxfiles on all four filers to
2.5 million inodes. The adm1-na1 and adm1-na4 both printed the warning about having more than twice as many files as needed, but adm1-na2 and adm1-na3 did not. However, after all four were set to 2.5 million files, the df did not change (trying after umounting and remounting the filesystems, rebooting the filers, etc.).
As Ken mentioned, when meta-data files are created, they are just a large "hole". That is, the size of the file is large, but there aren't actually any blocks in it because it contains nothing but zeros.
So running "maxfiles" doesn't actually consume any space when you run it. The space only gets allocate when you actually create a new file with an inode number that places it in a new part of the inode file.
Dave
On Sat, 22 Nov 1997, Dave Hitz wrote:
So running "maxfiles" doesn't actually consume any space when you run it. The space only gets allocate when you actually create a new file with an inode number that places it in a new part of the inode file.
So the maxfiles table is like a sparse file (wow, haven't used that term since my Apple II days) without an inode number? I think the discrepancy in 'df' output from my first message may have been caused by coincidental timing of snaphots.
I rezeroed all the filers again, and watched the df output after each stage of setting up a filer. They all started out the same, but then the disk usage appears to jump when the first snapshot is created. I think what happened the first time was that I was able to "snap sched 0" two of the filers, was distracted by other work, then continued working on the other two. In the meantime, the second pair of filers went ahead and created a snapshot, which seems to be correlated to a change in 'df' output (even though no real files were added or deleted).
Anyway, just more of my curious musings, in my attempts to see how this Netapp contraption works. ;-)
So the maxfiles table is like a sparse file (wow, haven't used
that term since my Apple II days) without an inode number?
Yep. Check out the WAFL paper under the "technology" / "architecture" section of our web page for details.
All of the meta-data in WAFL is stored in hidden files with inode numbers in the range 32-63, except for the inode file itself, whose inode is stored at a fixed location on disk where it can be found at boot time. (Multiple fixed locations, actually.)
I rezeroed all the filers again, and watched the df output after
each stage of setting up a filer. They all started out the same, but then the disk usage appears to jump when the first snapshot is created.
Oh -- another meta-data file that starts out sparse is the blkmap file, which keeps track of which blocks are used in the active filesystem and in the snapshots.
The first time you create a snapshot, WAFL marches through the whole blkmap file, copying the active filesystem bitplane into the snapshot bitplane, which faults in the whole file. (If you don't create a snapshot, then the blkmap file will be faulted in over time as WAFL scans through the disks allocating space for newly written data.)
That file is one MB per GB of disk space, so after the first snapshot in a brand new 100 GB filesystem, you should lose about 100 MB.
Anyway, just more of my curious musings, in my attempts to see how
this Netapp contraption works. ;-)
Black-box reverse engineering at it's best. You remind me of some of the engineers here at NetApp. "Hey! Check out THIS new way to kill a filer..."
NetApp Engineering: Where sadism is job #1.
Dave
On Tue, 25 Nov 1997, Dave Hitz wrote:
All of the meta-data in WAFL is stored in hidden files with inode numbers in the range 32-63, except for the inode file itself, whose inode is stored at a fixed location on disk where it can be found at boot time. (Multiple fixed locations, actually.)
So when you use maxfiles to increase the number of inodes, both the inode file and the inode-map file need to be grown? I noticed that deleting a single 2GB file can take several seconds of 100% CPU usage on an F230. Is this simply because the filer has to work its way down 6 or 7 layers of inode indirect blocks (by my calculations) and gathering up the half million or so 4K blocks? Still, it's a little faster than an Ultra-1 running Solaris doing the same thing to a local filesystem. ;-)
That file is one MB per GB of disk space, so after the first snapshot in a brand new 100 GB filesystem, you should lose about 100 MB.
That sounds about right then... a total of 40MB was used after the first snapshot (including the stuff in /etc), and the filers have 9 4GB data drives.
Black-box reverse engineering at it's best. You remind me of some of the engineers here at NetApp. "Hey! Check out THIS new way to kill a filer..."
Or...
"Engineer, it hurts when I do *this*!" "Cool! Hold still and let me try it..."
So when you use maxfiles to increase the number of inodes, both
the inode file and the inode-map file need to be grown?
Yep.
I noticed that deleting a single 2GB file can take several seconds of 100% CPU usage on an F230. Is this simply because the filer has to work its way down 6 or 7 layers of inode indirect blocks (by my calculations) and gathering up the half million or so 4K blocks?
Yep again. There's just lots of indirect blocks to clear, and blkmap entries to update.
Files up to 64 bytes fit in the inode itself. Files up to 64K use the 64 bytes in the inode as 16 direct pointers (16*4K = 64K). Files up to 64M contain 16 singly indirect pointers (16*1024*4K = 64M). And with 16 doubly indirect pointers, you can get up to 64G.
Note that this is different from UFS, in which the first 10 pointers are always direct, the 11th is singly indirect, 12th doubly indirect, etc. The WAFL way makes the math easier, it takes less indirection to handle larger files, and file traversal falls out as nicely recursive.
Dave
Brian Tao wrote:
On Thu, 20 Nov 1997, Kenneth Whittaker wrote:
Discrepency between du and df on an empty filesystem. I suspect that it is the meta-data files. There are invisible files that hold information about the filesystem.
New information: I bumped up the maxfiles on all four filers to
2.5 million inodes. The adm1-na1 and adm1-na4 both printed the warning about having more than twice as many files as needed, but adm1-na2 and adm1-na3 did not. However, after all four were set to 2.5 million files, the df did not change (trying after umounting and remounting the filesystems, rebooting the filers, etc.).
That is correct. Because, when you increase maxfiles, a big hole is stuck at the end of the appropriate meta-data files. A hole, as some people will agree, consumes no blocks.
Try this from a Solaris box running perl: Check your "df" ouput.
(create 100,000 files in an empty directory) perl -e 'for($i=0;$i<100000;$i++){open(F,">$i")}'
(delete the files) perl -e 'for($i=0;$i<100000;$i++){unlink($i)}'
You just filled in a bunch of holes. Now check your "df" output.
As I said previously, maxfiles on the filers should have been
identical, assuming that previous maxfiles settings are lost when the filesystem is wiped clean and recreated. I'm going to wipe everything out again after the latest rounds of tests, and I'll try to reproduce this condition.
maxfiles does not explain away anything, and I don't believe maxfiles has anything to do with this problem whatsoever. I have heard people mention maxfiles in association with this problem, but I have ignored it thus far.
|| admin# df -k /na1 /na2 /na3 /na4 || Filesystem kbytes used avail capacity Mounted on || adm1-na1:/ 33007568 4520 33003048 1% /na1 || adm1-na2:/ 33007568 40192 32967376 1% /na2 || adm1-na3:/ 33007568 40184 32967384 1% /na3 || adm1-na4:/ 33007568 4408 33003160 1% /na4
-- Brian Tao (BT300, taob@netcom.ca) "Though this be madness, yet there is method in't"
Oh yea, if you try this, make sure it is a new filesystem or it may not show you a difference in the "df" output.
Kenneth Whittaker wrote:
Brian Tao wrote:
On Thu, 20 Nov 1997, Kenneth Whittaker wrote:
Discrepency between du and df on an empty filesystem. I suspect that it is the meta-data files. There are invisible files that hold information about the filesystem.
New information: I bumped up the maxfiles on all four filers to
2.5 million inodes. The adm1-na1 and adm1-na4 both printed the warning about having more than twice as many files as needed, but adm1-na2 and adm1-na3 did not. However, after all four were set to 2.5 million files, the df did not change (trying after umounting and remounting the filesystems, rebooting the filers, etc.).
That is correct. Because, when you increase maxfiles, a big hole is stuck at the end of the appropriate meta-data files. A hole, as some people will agree, consumes no blocks.
Try this from a Solaris box running perl: Check your "df" ouput.
(create 100,000 files in an empty directory) perl -e 'for($i=0;$i<100000;$i++){open(F,">$i")}'
(delete the files) perl -e 'for($i=0;$i<100000;$i++){unlink($i)}'
You just filled in a bunch of holes. Now check your "df" output.
As I said previously, maxfiles on the filers should have been
identical, assuming that previous maxfiles settings are lost when the filesystem is wiped clean and recreated. I'm going to wipe everything out again after the latest rounds of tests, and I'll try to reproduce this condition.
maxfiles does not explain away anything, and I don't believe maxfiles has anything to do with this problem whatsoever. I have heard people mention maxfiles in association with this problem, but I have ignored it thus far.
|| admin# df -k /na1 /na2 /na3 /na4 || Filesystem kbytes used avail capacity Mounted on || adm1-na1:/ 33007568 4520 33003048 1% /na1 || adm1-na2:/ 33007568 40192 32967376 1% /na2 || adm1-na3:/ 33007568 40184 32967384 1% /na3 || adm1-na4:/ 33007568 4408 33003160 1% /na4
-- Brian Tao (BT300, taob@netcom.ca) "Though this be madness, yet there is method in't"