FWIW:
we went from dir to mailbox format a few months ago. We have a 740, and the filer went from a sustaned 2k nfs-ops down to ~200 ops and from 1.7mb/sec down to about 70k/sec. (We also doubled the number of users on the system at the same time period.)
one thing I found to be a big win is to make sure all the filenames are 12 characters or less. evidentally, the netapp only caches names that are <12 characters long. so if the name is >12 characters it takes a disk-op to read the directory.
each 4k inode can only store 127 24 byte (12 unicode) entries so it was also a big inode saver when we went to the <12char file/dir names.
also, if you use one of the dir permission bits to set flags on the messages it keeps the entry un-modded for snapshots. then you can turn snapshots way up and worst case you have each messages only stored once. no more multiple copies of huge mailboxes
-Luke
On Mon, 18 Oct 1999 tkaczma@gryf.net wrote:
You have valid points. Do you think that the decreased read performace is offset by the lowered necessity to read all mail each time you adjust the mailbox?
On our business virtual hosting servers, customers are allowed to
keep as much mail as they want (well, up to a per-customer hard limit of 1 gigabyte). There are users with two or three hundred-megabyte mailbox files who have Eudora or Outlook set to check for new mail every 5 minutes. Depending on what else is going on (the servers also handle web hosting), each POP3 process may only be able to pull in 1 or 2 megabytes per second. When you have a thousand domains per server, and tens to hundreds of users per domain, the filers literally spend 95% of their CPU scanning mailboxes that, for the most part, don't change very much.
Switching to a split mailbox format will absolutely help my
situation. The time needed to generate a mailbox index is now proportional to the number of messages in the mailbox, not the size of the mailbox. With average message size skyrocketing (don't people send e-mail without 100K attachments anymore?!), this makes sense. I'm happy because I get back 90% of my filer's CPU and NFS ops. The customer is happy because now it only takes them a few seconds to check for new messages, not a couple of minutes.
Whereas keeping old mail in one file would be great because it would save space, inodes, and speed up backups rewriting all of those messages in order to remove one from the middle would be costly in performance.
I haven't done any before-and-after comparisons yet, but I imagine
incremental or differential backups would go much faster with split mailboxes. A user with a 100MB monolithic mailbox who receives a couple of new messages during the day will require the entire mailbox to be backed up to tape. With a split mailbox, only those two messages need to be sent to tape. OTOH, deleted messages will be more difficult to track, unless your backup software can keep tabs on which files have disappeared or been renamed since the last full backup. -- Brian Tao (BT300, taob@risc.org) "Though this be madness, yet there is method in't"