George Kahler wrote:
Q: Will mbx buy me performance on the toaster ? Q: Is Maildir a better choice ?
As in all difficult questions the answer is "it depends." People have given some very good reasons on why maildir is an appropriate choice given certain constraints, so let me mention a few things that favor mbox-format mail storage on a filer. We run several tens of millions of user mailboxes on a fleet of NetApps for YahooMail and so here are a few other points to consider that we have run into over the past few years:
1) Locking is a dead issue for mbox. There is no way to adequately provide file locking via the standard channels so you will probably need to deal with the locking issue as a seperate problem. We ended up writing our own mailbox locking system but you might be able to use procmail for as your local delivery agent and let that try to figure out the locking issues. Locking is important, but do not fixate on this issue and ignore all of the other points that you need to consider.
2) Inode cost should not be overlooked. You get at most 32 million per volume, so if you have a lot of mailboxes a maildir or other one-file-per-message scheme will chew through your inode limit long before you run out of disk space on that volume. To compensate you will have to make smaller volumes. This is not necessarily a bad thing but each volume will be at least one additional raid group and on additional wasted parity disk (increased cost/mailbox). The larger number of inodes in use will also increase the time it takes your filer to complete a wack run and will have an impact on backup speeds.
3) Mail messages are small. There is a simple bimodal distribution to internet mail message size, they are either very small or very large. The small files will cost you storage bits because the filer uses 4K to store each 500byte message. The small file sizes will also keep your read chain length very low so we will get significantly decreased performance on your backups of this volume and you will notice some decreased read performance on the filer in general.
jim
On Mon, 18 Oct 1999, Jim McCoy wrote:
- Mail messages are small. There is a simple bimodal distribution to
internet mail message size, they are either very small or very large. The small files will cost you storage bits because the filer uses 4K to store each 500byte message. The small file sizes will also keep your read chain length very low so we will get significantly decreased performance on your backups of this volume and you will notice some decreased read performance on the filer in general.
You have valid points. Do you think that the decreased read performace is offset by the lowered necessity to read all mail each time you adjust the mailbox? I'm mostly interested in IMAP/POP type environemnts where most of the mail that is read and deleted is new mail. Old mail simply piles up. Whereas keeping old mail in one file would be great because it would save space, inodes, and speed up backups rewriting all of those messages in order to remove one from the middle would be costly in performance.
One of our problems is mailbox "corruption." Not really corruption in the usual sense, but header features which made mail readers croak, like a couple hundred (if not thousand) address "To:" lines. Sifting through individual files and unlinking a single file is easier than parsing out one huge mail spool file and programatically removing such messages. This is especially true since one must only check messages whose files were created since the last time mail was accessed sucessfully.
What do you think?
Tom
On Mon, 18 Oct 1999 tkaczma@gryf.net wrote:
You have valid points. Do you think that the decreased read performace is offset by the lowered necessity to read all mail each time you adjust the mailbox?
On our business virtual hosting servers, customers are allowed to keep as much mail as they want (well, up to a per-customer hard limit of 1 gigabyte). There are users with two or three hundred-megabyte mailbox files who have Eudora or Outlook set to check for new mail every 5 minutes. Depending on what else is going on (the servers also handle web hosting), each POP3 process may only be able to pull in 1 or 2 megabytes per second. When you have a thousand domains per server, and tens to hundreds of users per domain, the filers literally spend 95% of their CPU scanning mailboxes that, for the most part, don't change very much.
Switching to a split mailbox format will absolutely help my situation. The time needed to generate a mailbox index is now proportional to the number of messages in the mailbox, not the size of the mailbox. With average message size skyrocketing (don't people send e-mail without 100K attachments anymore?!), this makes sense. I'm happy because I get back 90% of my filer's CPU and NFS ops. The customer is happy because now it only takes them a few seconds to check for new messages, not a couple of minutes.
Whereas keeping old mail in one file would be great because it would save space, inodes, and speed up backups rewriting all of those messages in order to remove one from the middle would be costly in performance.
I haven't done any before-and-after comparisons yet, but I imagine incremental or differential backups would go much faster with split mailboxes. A user with a 100MB monolithic mailbox who receives a couple of new messages during the day will require the entire mailbox to be backed up to tape. With a split mailbox, only those two messages need to be sent to tape. OTOH, deleted messages will be more difficult to track, unless your backup software can keep tabs on which files have disappeared or been renamed since the last full backup.
FWIW:
we went from dir to mailbox format a few months ago. We have a 740, and the filer went from a sustaned 2k nfs-ops down to ~200 ops and from 1.7mb/sec down to about 70k/sec. (We also doubled the number of users on the system at the same time period.)
one thing I found to be a big win is to make sure all the filenames are 12 characters or less. evidentally, the netapp only caches names that are <12 characters long. so if the name is >12 characters it takes a disk-op to read the directory.
each 4k inode can only store 127 24 byte (12 unicode) entries so it was also a big inode saver when we went to the <12char file/dir names.
also, if you use one of the dir permission bits to set flags on the messages it keeps the entry un-modded for snapshots. then you can turn snapshots way up and worst case you have each messages only stored once. no more multiple copies of huge mailboxes
-Luke
On Mon, 18 Oct 1999 tkaczma@gryf.net wrote:
You have valid points. Do you think that the decreased read performace is offset by the lowered necessity to read all mail each time you adjust the mailbox?
On our business virtual hosting servers, customers are allowed to
keep as much mail as they want (well, up to a per-customer hard limit of 1 gigabyte). There are users with two or three hundred-megabyte mailbox files who have Eudora or Outlook set to check for new mail every 5 minutes. Depending on what else is going on (the servers also handle web hosting), each POP3 process may only be able to pull in 1 or 2 megabytes per second. When you have a thousand domains per server, and tens to hundreds of users per domain, the filers literally spend 95% of their CPU scanning mailboxes that, for the most part, don't change very much.
Switching to a split mailbox format will absolutely help my
situation. The time needed to generate a mailbox index is now proportional to the number of messages in the mailbox, not the size of the mailbox. With average message size skyrocketing (don't people send e-mail without 100K attachments anymore?!), this makes sense. I'm happy because I get back 90% of my filer's CPU and NFS ops. The customer is happy because now it only takes them a few seconds to check for new messages, not a couple of minutes.
Whereas keeping old mail in one file would be great because it would save space, inodes, and speed up backups rewriting all of those messages in order to remove one from the middle would be costly in performance.
I haven't done any before-and-after comparisons yet, but I imagine
incremental or differential backups would go much faster with split mailboxes. A user with a 100MB monolithic mailbox who receives a couple of new messages during the day will require the entire mailbox to be backed up to tape. With a split mailbox, only those two messages need to be sent to tape. OTOH, deleted messages will be more difficult to track, unless your backup software can keep tabs on which files have disappeared or been renamed since the last full backup. -- Brian Tao (BT300, taob@risc.org) "Though this be madness, yet there is method in't"
On Mon, 25 Oct 1999, Luke Gain wrote:
we went from dir to mailbox format a few months ago.
You mean the other way around?
We have a 740, and the filer went from a sustaned 2k nfs-ops down to ~200 ops and from 1.7mb/sec down to about 70k/sec.
Yep, and now that big investment in the F740 doesn't hurt as much now. :)
evidentally, the netapp only caches names that are <12 characters long. so if the name is >12 characters it takes a disk-op to read the directory.
Is this based on trial-and-error, or has Netapp documented this somewhere?
also, if you use one of the dir permission bits to set flags on the messages it keeps the entry un-modded for snapshots.
Man, talk about abusing the filesystem... ;-)
On Mon, 25 Oct 1999, Luke Gain wrote:
each 4k inode can only store 127 24 byte (12 unicode) entries so it was also a big inode saver when we went to the <12char file/dir names.
This is not true. You don't save any inodes by having shorter names.
Tom
That was my rationale more or less exactly.
Thanks Brian.
Tom
On Mon, 18 Oct 1999, Jim McCoy wrote:
- Inode cost should not be overlooked. You get at most 32 million per
volume, so if you have a lot of mailboxes a maildir or other one-file-per-message scheme will chew through your inode limit long before you run out of disk space on that volume.
BTW, how did you arrive at that number?
Tom