There used to be a Netapp tech report (I can't find it anymore) that gave some specifics on this very question; it was about large directories on Netapp versus other technologies; perhaps the paper has obsolete information.
Glenn (the other one)
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Stephen C. Losen Sent: Wednesday, July 19, 2006 4:54 PM To: Glenn Walker Cc: Page, Jeremy; David Lee; toasters@mathworks.com Subject: Re: directory efficiency
These considerations still matter, but the number of files in a
directory have to be pretty large for performance considerations to come into play. Typically speaking, splitting up the entries so that there are less per directory is still a good idea - but in this case, I'd say that only 200 per directory is too small and a bit of a waste.
What you're looking for is a ceiling at which the max entries will
become too much to bear - that depends on the application. I would think keeping it less than 100,000 wouuld suffice for WAFL, but at some point the client will take an inordinate amount of time to enumerate all of the entries. Perhaps something more along the lines of 10,000 entries would be a good place to start?
Directories with thousands of entries are always slow if you need to scan the entire directory to list its contents or look for file names that match a pattern. But if you are accessing one file whose name you know, then WAFL will access it very quickly, even if the directory has many thousands of entries. I am pretty sure that WAFL avoids sequential directory scans by using hashing and/or search trees.
Steve Losen scl@virginia.edu phone: 434-924-0640
University of Virginia ITC Unix Support