Okay, now I am really really confused.
Here's a summary of my understanding; maybe someone can point out where I'm off track.
1. WAFL blocks are 4 KB 2. Max # of inodes is one per 4 K block 3. Each directory, even an empty one, fills at least one block 4. But, up to 128 empty directories can be sqeezed into one block
So based on 3, I will run out of room before running out of inodes, but based on 4, I will run out of inodes before running out of room. (Maybe the problem is I don't understand was a dirent is?)
Slightly off topic, can anyone suggest possibly a different, non-NetApp storage/file system that will give me a lot more inodes than I can get out of a filer? (Sorry Greg, but I need the best platform to meet the needs of the application, which might or might not be NetApp, so I have to ask.) The IO demands of this application are quite low, but storage and inode requirements are high. So any solution needs to be able to store large amounts of data and provide massive quantities of inodes, but doesn't need to be as fast as a filer.
Thanks, Tom
MadDog@fool.com (Tom "Mad Dog" Yergeau) writes:
Okay, now I am really really confused.
Here's a summary of my understanding; maybe someone can point out where I'm off track.
- WAFL blocks are 4 KB
- Max # of inodes is one per 4 K block
- Each directory, even an empty one, fills at least one block
- But, up to 128 empty directories can be sqeezed into one block
1-3 are right, but not 4. An "empty" directory (they are never really empty, of course, as they have entries for "." and ".." in them) still occupies one 4 KB block in WAFL.
The *pointers* to 128 empty directories (or to anything else) can be "squeezed into one block".
So based on 3, I will run out of room before running out of inodes, but based on 4, I will run out of inodes before running out of room. (Maybe the problem is I don't understand was a dirent is?)
"dirent" = "directory entry" = "one entry in a directory". If "/foo/bar" is an empty directory, it has its own inode and its own 4 KB block with dirents for "." and ".." in it, and there is also a dirent for "bar" pointing to it and living in the directory "/foo".
There's obviously a lot of mutual incomprehension going on in this thread. I'm doing an experiment with my trusty old /vol/test which I hope should clarify what is being claimed, but I'll have to report on it tomorrow (1,000,000 mkdir's are not exactly fast...).
Slightly off topic, can anyone suggest possibly a different, non-NetApp storage/file system that will give me a lot more inodes than I can get out of a filer? (Sorry Greg, but I need the best platform to meet the needs of the application, which might or might not be NetApp, so I have to ask.) The IO demands of this application are quite low, but storage and inode requirements are high. So any solution needs to be able to store large amounts of data and provide massive quantities of inodes, but doesn't need to be as fast as a filer.
I am told that the Veritas (vxfs) filing system, unlike ufs or WAFL, can store sufficiently small directories in the inode, in the same way that WAFL does for small (<=64 bytes) regular files and symlinks.
Chris Thompson University of Cambridge Computing Service, Email: cet1@ucs.cam.ac.uk New Museums Site, Cambridge CB2 3QG, Phone: +44 1223 334715 United Kingdom.
----- Original Message ----- From: "Chris Thompson" cet1@cus.cam.ac.uk To: "Tom "Mad Dog" Yergeau" MadDog@fool.com Cc: marty.johnson@erols.com; toasters@mathworks.com Sent: Wednesday, April 04, 2001 1:11 PM Subject: Re: Inodes a go go
MadDog@fool.com (Tom "Mad Dog" Yergeau) writes:
Okay, now I am really really confused.
Here's a summary of my understanding; maybe someone can point out where
I'm
off track.
- WAFL blocks are 4 KB
- Max # of inodes is one per 4 K block
- Each directory, even an empty one, fills at least one block
- But, up to 128 empty directories can be sqeezed into one block
1-3 are right, but not 4. An "empty" directory (they are never really
empty,
of course, as they have entries for "." and ".." in them) still occupies one 4 KB block in WAFL.
The *pointers* to 128 empty directories (or to anything else) can be
"squeezed
into one block".
Right. In other words, 128 *directory entries* can be squeezed into one 4K block. What's a directory entry? Basically all the "filenames" of the files in a directory... so you files named "foo", "mbox", etc. all have their names stored in the 4K data block of the parent directory. Now, directories also have names, so like files, their name is stored in the 4K block of the parent.
So, even directory takes up 4K, minimum. If a directory has MORE than 128 entries... like a directory with 128 subdirectories all at the same level... then it gets allocated about 4K block.
One neat thing you can do is create a directory, fill it with a million files, then delete them all and you can still wind up with a directory that is megabytes in size but empty. Or at least you could. I think deleteing files "off the end" of the directory entry will actually shrink the directory size now, but you can fool this by not deleteing every 128th file. There's no automatic compression of "empty" directory entries. Unless that has changed too. Haven't tested it in ages. :)
Bruce
I wrote: [...]
There's obviously a lot of mutual incomprehension going on in this thread. I'm doing an experiment with my trusty old /vol/test which I hope should clarify what is being claimed, but I'll have to report on it tomorrow (1,000,000 mkdir's are not exactly fast...).
OK, here goes. /vol/test has one 18 GB data disc, and starts with only a single sub-directory /vol/test/cet1 under the root:
puppis> df /vol/test;df -i /vol/test Filesystem kbytes used avail capacity Mounted on /vol/test/ 12518648 220 12518428 0% /vol/test/ /vol/test/.snapshot 3129660 0 3129660 0% /vol/test/.snapshot Filesystem iused ifree %iused Mounted on /vol/test/ 97 516065 0% /vol/test/
We're going to use all the space...
puppis> snap sched test 0 puppis> snap reserve test 0
... and as many inodes as we are allowed to ...
puppis> maxfiles test 9999999 Max inode count cannot exceed 4129415 Cannot support more than 1 inode per 4 KB of disk space puppis> maxfiles test 4129415
... and now it looks like this:
puppis> df /vol/test;df -i /vol/test Filesystem kbytes used avail capacity Mounted on /vol/test/ 15648308 224 15648084 0% /vol/test/ /vol/test/.snapshot 0 0 0 ---% /vol/test/.snapshot Filesystem iused ifree %iused Mounted on /vol/test/ 97 4129318 0% /vol/test/
I create 100 empty directories /vol/test/cet1/00, /vol/test/cet1/01, ... /vol/test/cet1/99:
puppis> df /vol/test;df -i /vol/test Filesystem kbytes used avail capacity Mounted on /vol/test/ 15648308 672 15647636 0% /vol/test/ /vol/test/.snapshot 0 0 0 ---% /vol/test/.snapshot Filesystem iused ifree %iused Mounted on /vol/test/ 197 4129218 0% /vol/test/
... and in each of those I now create 100 empty subdirectories, such as /vol/test/cet1/31/41 ...
puppis> df /vol/test;df -i /vol/test Filesystem kbytes used avail capacity Mounted on /vol/test/ 15648308 42832 15605476 0% /vol/test/ /vol/test/.snapshot 0 0 0 ---% /vol/test/.snapshot Filesystem iused ifree %iused Mounted on /vol/test/ 10197 4119218 0% /vol/test/
... and in each of *those* I create 100 empty subdirectories, such as /vol/test/cet1/31/41/59 ...
puppis> df /vol/test;df -i /vol/test Filesystem kbytes used avail capacity Mounted on /vol/test/ 15648308 4195216 11453092 27% /vol/test/ /vol/test/.snapshot 0 0 0 ---% /vol/test/.snapshot Filesystem iused ifree %iused Mounted on /vol/test/ 1010197 3119218 24% /vol/test/
And I can't really justify continuing to kill the cache on this filer until something actually runs out! But you will, I trust, see that the ratio of space used to inodes used is essentially 4 KB : 1 inode. (Actually a bit worse, as the inode and blockmap metafiles are being filled in as well.) It's a close thing, but space is actually going to run out before the inodes do.
Chris Thompson University of Cambridge Computing Service, Email: cet1@ucs.cam.ac.uk New Museums Site, Cambridge CB2 3QG, Phone: +44 1223 334715 United Kingdom.
Basic stuff here that most folks already know, but it might help some newbies understand what's going on.
Files and directories are stored in a volume (filesystem) using a table of inodes and a pool of data blocks. Each file and directory is represented by an inode, a 128 byte data structure. A data block is 4k and is used for storing file data. A data block is either unused (free) or is assigned to a file.
The number of inodes determines how many files and directories the volume can hold. The number of data blocks determines how much file data the volume can hold. You can increase the number of inodes in a volume with the maxfiles command, but that decreases the number of data blocks, since some data blocks must be reassigned to the inode table to enlarge it. You cannot decrease the size of the inode table except by destroying the volume and rebuilding it. Whenever you add a new disk to a volume, both the data block pool and the inode table are enlarged.
Each inode contains information about a file or directory, including:
owner (UID number) group (GID number) permission bits (read/write/execute for user/group/other) file type bits (regular file, directory, device, symlink, etc.) device id (identifies volume containing the file) quota tree id (identifies quota tree containing the file) file size (length of file in bytes) link count (number of hard links to the file) timestamps (data access time, data mod time, inode mod time) data block list (specifies which data blocks hold file contents)
For files up to 64 bytes long, the file data is stored in the inode in the space used for the data block list. This avoids consuming an entire 4k data block to hold just a few bytes.
Each inode is identified by an inode number, which can be thought of as an index into the inode table. An inode is accessed by calculating its offset in the table using its inode number.
In particular, an inode does NOT store the file name. The file name is stored in the parent directory.
A directory is just a file that contains of a list of directory entries called "dirents". A dirent consists of a file name and an inode number. Dirents are variable length depending on the length of the file name. To see a sorted list of the dirents in a directory, run "ls -ai"
When you access a file (or directory) the parent directory is read to find the dirent with the file name. Using the corresponding inode number, the inode for the file is located in the inode table.
A newly created directory contains two dirents for the file names "." and "..". The "." dirent has the directory's own inode number while ".." has the parent directory's inode number. So these are not just a notational convention; they actually exist. A data block is assigned to the directory to store these two dirents. So the new directory consumes 1 inode, 1 data block, and a dirent in the parent directory.
Steve Losen scl@virginia.edu phone: 804-924-0640
University of Virginia ITC Unix Support
On Wed, 4 Apr 2001, Tom "Mad Dog" Yergeau wrote:
Okay, now I am really really confused.
Here's a summary of my understanding; maybe someone can point out where I'm off track.
- WAFL blocks are 4 KB
- Max # of inodes is one per 4 K block
- Each directory, even an empty one, fills at least one block
- But, up to 128 empty directories can be sqeezed into one block
This is slightly wrong. When you create a directory it gets a block allocated to it. The reason for this is that there really is no such thing as an empty directory. To demonstrate this do the following:
mkdir empty cd empty ls -la
The ls command will print out two items, "." and "..".
Now, based on the assumption that a directory block can hold 128 items, you can add 126 additionals items to the directory that you just created. If the items you create are additionals directories they will in turn each consume an inode and a data block. On the other hand, if you create a zero length file it does not consume any resources other than an inode and the small amount of space it takes in the existing directory block.
So, if you create a small number of directories and fill them with a huge number of files you will run out of inodes long before you run out of actual disk space. On the other hand if you create a huge number of directories you should run out of inodes and actual disk space at about the same time.
One caveat to all of this is that the assumption that you can fit 128 items into a directory block is just wrong. You can fit 128 inodes into a single 4K data block but a directory entry is not an inode. A directory entry consists of the name of the file and a pointer to the inode. The number of directory entries that you can fit into a single 4K directory block is dependant on the length of the filenames you create.
One caveat to all of this is that the assumption that you can fit 128 items into a directory block is just wrong. You can fit 128 inodes into a single 4K data block but a directory entry is not an inode. A directory entry consists of the name of the file and a pointer to the inode. The number of directory entries that you can fit into a single 4K directory block is dependant on the length of the filenames you create.
Yeah, so like I even confused myself in the email I just sent out before I got this one. Exactly how many directory entries you get depends on how much data it takes up, which depends on the length of the file names. But, no matter what, you can't get more than 128 because every entry is a pointer to an inode. Except maybe for special files?
Bruce
Slightly off topic, can anyone suggest possibly a different, non-NetApp storage/file system that will give me a lot more inodes than I can get out of a filer? (Sorry Greg, but I need the best platform to meet the needs of the application, which might or might not be NetApp, so I have to ask.) The IO demands of this application are quite low, but storage and inode requirements are high. So any solution needs to be able to store large amounts of data and provide massive quantities of inodes, but doesn't need to be as fast as a filer.
Look into my eyes.... you're getting sleepy sleeeeeeppy.... now that your under, think WAFL.. WAFL... WAFL.... when I snap my fingers you will wake up and make yourself two waffle's and then order a filer with WAFL! and you will also agree that Duke SUCKS.., THE TERPS RULE and that they WAS ROBBED!
;-)