I think it might, as far as I know filers still use inodes, and it seems that the nested levels would be what would cause the problems here...but since they are going to increase the number of blocks then can address at the ^2 of the number of inodes deep I'm surprised this would cause performance issues. Adam probably can give you a 100% answer.
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of David Lee Sent: Wednesday, July 19, 2006 10:37 AM To: toasters@mathworks.com Subject: directory efficiency
(Is the following an FAQ? If so, apologies, point me in the right direction...)
In years gone by, on a typical UNIX filesystem, creating entries (e.g. new files) in a directory could become badly inefficient as that directory size increased. By the time that (for instance) a "/var/spool/mail" reached (say) 20,000 entries, the acts of creating and deleting lockfiles in that directory could seriously impede things. (As a consequence, various sites devised various workaround; for instance we split into 100 subdirectories based on "uid mod 100", so that each subdir was only around (say) 200 entries.
When we migrated this to NetApp a couple of years back, we simply kept this subdivision in place. But now another consideration is coming into play which would be eased if we could consider reverting to the single large (~20,000 entry) directory.
Do such efficiency considerations matter in NetApp WAFL?
These considerations still matter, but the number of files in a directory have to be pretty large for performance considerations to come into play. Typically speaking, splitting up the entries so that there are less per directory is still a good idea - but in this case, I'd say that only 200 per directory is too small and a bit of a waste.
What you're looking for is a ceiling at which the max entries will become too much to bear - that depends on the application. I would think keeping it less than 100,000 wouuld suffice for WAFL, but at some point the client will take an inordinate amount of time to enumerate all of the entries. Perhaps something more along the lines of 10,000 entries would be a good place to start?
Glenn
-----Original Message----- From: owner-toasters@mathworks.com on behalf of Page, Jeremy Sent: Wed 7/19/2006 12:27 PM To: David Lee; toasters@mathworks.com Subject: RE: directory efficiency
I think it might, as far as I know filers still use inodes, and it seems that the nested levels would be what would cause the problems here...but since they are going to increase the number of blocks then can address at the ^2 of the number of inodes deep I'm surprised this would cause performance issues. Adam probably can give you a 100% answer.
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of David Lee Sent: Wednesday, July 19, 2006 10:37 AM To: toasters@mathworks.com Subject: directory efficiency
(Is the following an FAQ? If so, apologies, point me in the right direction...)
In years gone by, on a typical UNIX filesystem, creating entries (e.g. new files) in a directory could become badly inefficient as that directory size increased. By the time that (for instance) a "/var/spool/mail" reached (say) 20,000 entries, the acts of creating and deleting lockfiles in that directory could seriously impede things. (As a consequence, various sites devised various workaround; for instance we split into 100 subdirectories based on "uid mod 100", so that each subdir was only around (say) 200 entries.
When we migrated this to NetApp a couple of years back, we simply kept this subdivision in place. But now another consideration is coming into play which would be eased if we could consider reverting to the single large (~20,000 entry) directory.
Do such efficiency considerations matter in NetApp WAFL?
These considerations still matter, but the number of files in a directory have to be pretty large for performance considerations to come into play. Typically speaking, splitting up the entries so that there are less per directory is still a good idea - but in this case, I'd say that only 200 per directory is too small and a bit of a waste.
What you're looking for is a ceiling at which the max entries will become too much to bear - that depends on the application. I would think keeping it less than 100,000 wouuld suffice for WAFL, but at some point the client will take an inordinate amount of time to enumerate all of the entries. Perhaps something more along the lines of 10,000 entries would be a good place to start?
Directories with thousands of entries are always slow if you need to scan the entire directory to list its contents or look for file names that match a pattern. But if you are accessing one file whose name you know, then WAFL will access it very quickly, even if the directory has many thousands of entries. I am pretty sure that WAFL avoids sequential directory scans by using hashing and/or search trees.
Steve Losen scl@virginia.edu phone: 434-924-0640
University of Virginia ITC Unix Support
I think it might, as far as I know filers still use inodes, and it seems that the nested levels would be what would cause the problems here...but since they are going to increase the number of blocks then can address at the ^2 of the number of inodes deep I'm surprised this would cause performance issues. Adam probably can give you a 100% answer.
Just like traditional UNIX filesystems, I'm pretty sure WAFL has no directory index. So you have a linear search through a directory for access.
This is the reason you have a maxdirsize, so that there is a limit to the number of entries you can put in a directory and to the amount of searching you do in a directory.
The guide discusses this.
maxdirsize number
Sets the maximum size (in KB) to which a directory can grow. This is set to 1% of the total system memory by default. Most users should not need to change this setting. If this setting is changed to be above the default size, a notice message will be printed to the console explaining that this may impact performance. This option is useful for environments in which system users may grow a directory to a size that starts impacting system performance. When a user tries to create a file in a directory that is at the limit, the system returns a ENOSPC error and fails the create.
Note that the default should be able to hold considerably more than the 20K entries of the OP. I'd rather not design a process that required such an architecture, but that many in a directory wouldn't worry me very much unless the filer were CPU stressed to begin with.