Suppose I have a filer (760) with about 50 million files -- between 1k and 500M. What would be the optimum directory structure to keep them
all?
We had the same problem at Yahoo when it comes to storing user database entries (e.g. MyYahoo account info) and mailboxes for YahooMail accounts. The biggest win we found was that you should go to great lengths to make sure that your directories can stay in the cache as much as possible. For YahooMail this was a four layer structure (/##/##/####/<hash>/maildir) and for the MyYahoo it was three layers. The WAFL system is pretty good when it comes to doing lookups on big directories, the only thing you need to do to help it out is to load up on system RAM and keep directories small enough that you can keep lots of them in the filers cache so that with a little bit of luck lookups run through the directories in cache and then hit the disk for the actual files.
jim mccoy Yahoo! Inc.