Hi, I'm about to start a conversion project of our University email system based on mbox folders to Courier IMAP maildir. There are about 50K users using about 250 GB of space on F840 cluster; Netapp version 6.2.2. We have been also told that as of this year each student will keep their account for life, therefore this number will keep on growing.
I'm very concern about the number of inodes/files/volume and the filer performance.
From all the information I could find some of you said that you get about o one inode for every 32 K o can be increased to as much as 1 inode for every 4 K (maxfiles)
There was also some discussion that when you do this, the internal WAFL tables become so huge that it will impact performance.
32 million / 50 K users gives me only about 650 files; this is clearly not enough.
Will someone having gone thru this type of a conversion give me some pointers of what to do and what not to do ???
Thanks, George
------------------------------------------------------------------------------- George Kahler e-mail: george@yorku.ca Sr. Systems Administrator humans: (416) 736-2100 x.22699 Computing and Network Services machines: (416) 736-5830 Ontario, Canada, M3J-1P3
i'm not sure i follow your math.
250G (or ~250,000,000 K) would give you (25000000K / 32 K/i = ) 7812500i
7.8 million inodes should be enough for 50K students as that is about 156250 files per student.
-- email: lance_bailey@pmc-sierra.com box: Lance R. Bailey, unix Administrator vox: +1 604 415 6646 PMC-Sierra, Inc fax: +1 604 415 6151 105-8555 Baxter Place http://www.lydia.org/~zaphod Burnaby BC, V5A 4V7 The philosophy, as I understand it, is that Linux is a kernel. Full stop. Anything else you add to it is simply a matter of taste and need. -- Brian Ven Der Buhs
On Tue, Feb 18, 2003 at 02:49:19PM -0500, George Kahler wrote:
Hi, I'm about to start a conversion project of our University email system based on mbox folders to Courier IMAP maildir. There are about 50K users using about 250 GB of space on F840 cluster; Netapp version 6.2.2. We have been also told that as of this year each student will keep their account for life, therefore this number will keep on growing.
I'm very concern about the number of inodes/files/volume and the filer performance.
From all the information I could find some of you said that you get
about o one inode for every 32 K o can be increased to as much as 1 inode for every 4 K (maxfiles)
There was also some discussion that when you do this, the internal WAFL tables become so huge that it will impact performance.
32 million / 50 K users gives me only about 650 files; this is clearly not enough.
Will someone having gone thru this type of a conversion give me some pointers of what to do and what not to do ???
Thanks, George
George Kahler e-mail: george@yorku.ca Sr. Systems Administrator humans: (416) 736-2100 x.22699 Computing and Network Services machines: (416) 736-5830 Ontario, Canada, M3J-1P3
Ok, so I may be missing something here. I thought that the 7.8 mil / 50K students ~= 156 inodes or files. No ? Isn't one inode equal to one file ?
George
On Tue, 18 Feb 2003 12:18:36 -0800, Lance R Bailey baileyla@pmc-sierra.com wrote:
i'm not sure i follow your math.
250G (or ~250,000,000 K) would give you (25000000K / 32 K/i = ) 7812500i
7.8 million inodes should be enough for 50K students as that is about 156250 files per student.
-- email: lance_bailey@pmc-sierra.com box: Lance R. Bailey, unix Administrator vox: +1 604 415 6646 PMC-Sierra, Inc fax: +1 604 415 6151 105-8555 Baxter Place http://www.lydia.org/~zaphod Burnaby BC, V5A 4V7 The philosophy, as I understand it, is that Linux is a kernel. Full stop. Anything else you add to it is simply a matter of taste and need. -- Brian Ven Der Buhs
On Tue, Feb 18, 2003 at 02:49:19PM -0500, George Kahler wrote:
Hi, I'm about to start a conversion project of our University email system based on mbox folders to Courier IMAP maildir. There are about 50K users using about 250 GB of space on F840 cluster; Netapp version 6.2.2. We have been also told that as of this year each student will keep their account for life, therefore this number will keep on growing.
I'm very concern about the number of inodes/files/volume and the filer performance.
From all the information I could find some of you said that you get
about o one inode for every 32 K o can be increased to as much as 1 inode for every 4 K (maxfiles)
There was also some discussion that when you do this, the internal WAFL tables become so huge that it will impact performance.
32 million / 50 K users gives me only about 650 files; this is clearly not enough.
Will someone having gone thru this type of a conversion give me some pointers of what to do and what not to do ???
Thanks, George
George Kahler e-mail: george@yorku.ca Sr. Systems Administrator humans: (416) 736-2100 x.22699 Computing and Network Services machines: (416) 736-5830 Ontario, Canada, M3J-1P3
dropped a decimal place, darn sliderule :)
yes, 7.8 mil / 50K students ~= 156 inodes.
but if this is an imap server, each student will only require a minimul amount of files.
-- Lance Fly Air Freud, where the seats recline all the way back to childhood.
On Tue, Feb 18, 2003 at 03:33:01PM -0500, George Kahler wrote:
Ok, so I may be missing something here. I thought that the 7.8 mil / 50K students ~= 156 inodes or files. No ? Isn't one inode equal to one file ?
George
On Tue, 18 Feb 2003 12:18:36 -0800, Lance R Bailey baileyla@pmc-sierra.com wrote:
i'm not sure i follow your math.
250G (or ~250,000,000 K) would give you (25000000K / 32 K/i = ) 7812500i
7.8 million inodes should be enough for 50K students as that is about 156250 files per student.
-- email: lance_bailey@pmc-sierra.com box: Lance R. Bailey, unix Administrator vox: +1 604 415 6646 PMC-Sierra, Inc fax: +1 604 415 6151 105-8555 Baxter Place http://www.lydia.org/~zaphod Burnaby BC, V5A 4V7 The philosophy, as I understand it, is that Linux is a kernel. Full stop. Anything else you add to it is simply a matter of taste and need. -- Brian Ven Der Buhs
On Tue, Feb 18, 2003 at 02:49:19PM -0500, George Kahler wrote:
Hi, I'm about to start a conversion project of our University email system based on mbox folders to Courier IMAP maildir. There are about 50K users using about 250 GB of space on F840 cluster; Netapp version 6.2.2. We have been also told that as of this year each student will keep their account for life, therefore this number will keep on growing.
I'm very concern about the number of inodes/files/volume and the filer performance.
From all the information I could find some of you said that you get
about o one inode for every 32 K o can be increased to as much as 1 inode for every 4 K (maxfiles)
There was also some discussion that when you do this, the internal WAFL tables become so huge that it will impact performance.
32 million / 50 K users gives me only about 650 files; this is clearly not enough.
Will someone having gone thru this type of a conversion give me some pointers of what to do and what not to do ???
Thanks, George
George Kahler e-mail: george@yorku.ca Sr. Systems Administrator humans: (416) 736-2100 x.22699 Computing and Network Services machines: (416) 736-5830 Ontario, Canada, M3J-1P3
I have users (students and staff) that have on average about 2k to 3K email messages in their current mbox folders. (there are some that have 10K, 20k, 30k)
George
On Tue, 18 Feb 2003 12:45:33 -0800, Lance R Bailey baileyla@pmc-sierra.com wrote:
dropped a decimal place, darn sliderule :)
yes, 7.8 mil / 50K students ~= 156 inodes.
but if this is an imap server, each student will only require a minimul amount of files.
-- Lance Fly Air Freud, where the seats recline all the way back to childhood.
On Tue, Feb 18, 2003 at 03:33:01PM -0500, George Kahler wrote:
Ok, so I may be missing something here. I thought that the 7.8 mil / 50K students ~= 156 inodes or files. No ? Isn't one inode equal to one file ?
George
On Tue, 18 Feb 2003 12:18:36 -0800, Lance R Bailey baileyla@pmc-sierra.com wrote:
i'm not sure i follow your math.
250G (or ~250,000,000 K) would give you (25000000K / 32 K/i = ) 7812500i
7.8 million inodes should be enough for 50K students as that is about 156250 files per student.
-- email: lance_bailey@pmc-sierra.com box: Lance R. Bailey, unix Administrator vox: +1 604 415 6646 PMC-Sierra, Inc fax: +1 604 415 6151 105-8555 Baxter Place http://www.lydia.org/~zaphod Burnaby BC, V5A 4V7 The philosophy, as I understand it, is that Linux is a kernel. Full stop. Anything else you add to it is simply a matter of taste and need. -- Brian Ven Der Buhs
On Tue, Feb 18, 2003 at 02:49:19PM -0500, George Kahler wrote:
Hi, I'm about to start a conversion project of our University email system based on mbox folders to Courier IMAP maildir. There are about 50K users using about 250 GB of space on F840 cluster; Netapp version 6.2.2. We have been also told that as of this year each student will keep their account for life, therefore this number will keep on growing.
I'm very concern about the number of inodes/files/volume and the filer performance.
From all the information I could find some of you said that you get
about o one inode for every 32 K o can be increased to as much as 1 inode for every 4 K (maxfiles)
There was also some discussion that when you do this, the internal WAFL tables become so huge that it will impact performance.
32 million / 50 K users gives me only about 650 files; this is clearly not enough.
Will someone having gone thru this type of a conversion give me some pointers of what to do and what not to do ???
Thanks, George
George Kahler e-mail: george@yorku.ca Sr. Systems Administrator humans: (416) 736-2100 x.22699 Computing and Network Services machines: (416) 736-5830 Ontario, Canada, M3J-1P3
but is that one file or 2K files? the IMAP server i've used (admitedly unix) keep one file which is comprised of all the mail messages. we have one user here with over 4600 mail messages - all in one file that he accesses via IMAP.
-- Lance Not *everything* about the The Borg was bad.
On Tue, Feb 18, 2003 at 03:57:00PM -0500, George Kahler wrote:
I have users (students and staff) that have on average about 2k to 3K email messages in their current mbox folders. (there are some that have 10K, 20k, 30k)
George
On Tue, 18 Feb 2003 12:45:33 -0800, Lance R Bailey baileyla@pmc-sierra.com wrote:
dropped a decimal place, darn sliderule :)
yes, 7.8 mil / 50K students ~= 156 inodes.
but if this is an imap server, each student will only require a minimul amount of files.
-- Lance Fly Air Freud, where the seats recline all the way back to childhood.
On Tue, Feb 18, 2003 at 03:33:01PM -0500, George Kahler wrote:
Ok, so I may be missing something here. I thought that the 7.8 mil / 50K students ~= 156 inodes or files. No ? Isn't one inode equal to one file ?
George
On Tue, 18 Feb 2003 12:18:36 -0800, Lance R Bailey baileyla@pmc-sierra.com wrote:
i'm not sure i follow your math.
250G (or ~250,000,000 K) would give you (25000000K / 32 K/i = ) 7812500i
7.8 million inodes should be enough for 50K students as that is about 156250 files per student.
-- email: lance_bailey@pmc-sierra.com box: Lance R. Bailey, unix Administrator vox: +1 604 415 6646 PMC-Sierra, Inc fax: +1 604 415 6151 105-8555 Baxter Place http://www.lydia.org/~zaphod Burnaby BC, V5A 4V7 The philosophy, as I understand it, is that Linux is a kernel. Full stop. Anything else you add to it is simply a matter of taste and need. -- Brian Ven Der Buhs
On Tue, Feb 18, 2003 at 02:49:19PM -0500, George Kahler wrote:
Hi, I'm about to start a conversion project of our University email system based on mbox folders to Courier IMAP maildir. There are about 50K users using about 250 GB of space on F840 cluster; Netapp version 6.2.2. We have been also told that as of this year each student will keep their account for life, therefore this number will keep on growing.
I'm very concern about the number of inodes/files/volume and the filer performance.
From all the information I could find some of you said that you get
about o one inode for every 32 K o can be increased to as much as 1 inode for every 4 K (maxfiles)
There was also some discussion that when you do this, the internal WAFL tables become so huge that it will impact performance.
32 million / 50 K users gives me only about 650 files; this is clearly not enough.
Will someone having gone thru this type of a conversion give me some pointers of what to do and what not to do ???
Thanks, George
George Kahler e-mail: george@yorku.ca Sr. Systems Administrator humans: (416) 736-2100 x.22699 Computing and Network Services machines: (416) 736-5830 Ontario, Canada, M3J-1P3
On Tuesday, Feb 18, 2003, at 15:45 US/Eastern, Lance R Bailey wrote:
dropped a decimal place, darn sliderule :)
yes, 7.8 mil / 50K students ~= 156 inodes.
but if this is an imap server, each student will only require a minimul amount of files.
With Maildir format each message is a unique file. If you were using Cyrus or UW IMAP this wouldn't stop you though.
daniel
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com]On Behalf Of Daniel Mayfield Sent: Tuesday, February 18, 2003 1:03 PM To: toasters@mathworks.com Subject: Re: inodes/volumes/mildir/Courier IMAP
On Tuesday, Feb 18, 2003, at 15:45 US/Eastern, Lance R Bailey wrote:
dropped a decimal place, darn sliderule :)
yes, 7.8 mil / 50K students ~= 156 inodes.
but if this is an imap server, each student will only require a minimul amount of files.
With Maildir format each message is a unique file. If you were using Cyrus or UW IMAP this wouldn't stop you though.
I had the impression that Maildir was the only reasonably safe format over NFS, since there is no locking needed, etc.
Jordan
2003-02-18T16:34:53 Jordan Share:
I had the impression that Maildir was the only reasonably safe format over NFS, since there is no locking needed, etc.
Close; I think it's fairer to say that Maildir is the only utterly safe, perfectly bulletproof[1] format over NFS, since no locking is needed. Many people do mbox over NFS, and while NFS locking doesn't work perfectly portably with all combinations of server and client, it _does_ work reliably with specific known-good versions of lockd/statd on a well-supported client against the best-supported NFS server implementations, in whose ranks toasters certainly live.
With care, mbox can be reasonably safe over NFS. I still like Maildir much better.
-Bennett
[1] "Maildir classic", using only hostname, pid, time-to-the-second, and a seqno for single programs performing multiple deliveries, isn't perfectly safe (a) with OpenBSD, which randomizes pids rather than using them sequentially, or (b) with systems capable of very fast forking, in the face of a fork bomb. On most systems today pid-wrapping isn't yet an operational issue in practice, and as maildir implementors are including the inum in the maildir must-be-unique filename, that problem is being solved.
On Tuesday, Feb 18, 2003, at 16:34 US/Eastern, Jordan Share wrote:
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com]On Behalf Of Daniel Mayfield Sent: Tuesday, February 18, 2003 1:03 PM To: toasters@mathworks.com Subject: Re: inodes/volumes/mildir/Courier IMAP
On Tuesday, Feb 18, 2003, at 15:45 US/Eastern, Lance R Bailey wrote:
dropped a decimal place, darn sliderule :)
yes, 7.8 mil / 50K students ~= 156 inodes.
but if this is an imap server, each student will only require a minimul amount of files.
With Maildir format each message is a unique file. If you were using Cyrus or UW IMAP this wouldn't stop you though.
I had the impression that Maildir was the only reasonably safe format over NFS, since there is no locking needed, etc.
This is true, but as I recall systems like Cyrus can deliver messages without locking issues using LMTP. I'm *NOT* authoritative on this, so don't take my word for it. I usually only deal with the SMTP side of mail.
daniel
On Tue, Feb 18, 2003 at 04:03:12PM -0500, Daniel Mayfield wrote:
On Tuesday, Feb 18, 2003, at 15:45 US/Eastern, Lance R Bailey wrote:
dropped a decimal place, darn sliderule :)
yes, 7.8 mil / 50K students ~= 156 inodes.
but if this is an imap server, each student will only require a minimul amount of files.
With Maildir format each message is a unique file. If you were using Cyrus or UW IMAP this wouldn't stop you though.
And all that locking will bring performance down immensely.
That's why we converted away.
Worst cast for us...create another filesystem and up the inodes there as well and set users home directories accordingly.
On Tuesday, February 18, 2003, at 04:03 PM, Daniel Mayfield wrote:
With Maildir format each message is a unique file. If you were using Cyrus or UW IMAP this wouldn't stop you though.
Cyrus IMAPd stores one message per file, plus three additional files per mailbox for index, headers and cache.
Bryan
On Tue, Feb 18, 2003 at 12:18:36PM -0800, Lance R Bailey wrote:
i'm not sure i follow your math.
250G (or ~250,000,000 K) would give you (25000000K / 32 K/i = ) 7812500i
7.8 million inodes should be enough for 50K students as that is about 156250 files per student.
Real life ISP with 26K accounts:
f760> maxfiles Volume vol0: maximum number of files is currently 11355890 (5034495 used).
I am not noticing any problems with performance.
And we are using Courier-IMAP with Postfix as our delivery engine.
Guys, one more question; well two ;-)
Can the filer 4K blocks be fragmented ? ie, will a small email message end up using one 4K block no matter what ?
Has anyone come up with a solution of how to restore users email mesages/folders in an maildir IMAP environment ? To the user a message or folder will have a bizzare name and I don't see a way short of using something like 'grep' to find the file to be restored from the .snapshot.
It would be great if you could point your IMAP client at the .snapshot directory and treat it just as another folder to then copy messages to the real folder.
Thanks, George
------------------------------------------------------------------------------- George Kahler e-mail: george@yorku.ca Sr. Systems Administrator humans: (416) 736-2100 x.22699 Computing and Network Services machines: (416) 736-5830 Ontario, Canada, M3J-1P3
2003-02-19T17:11:40 George Kahler:
Can the filer 4K blocks be fragmented ? ie, will a small email message end up using one 4K block no matter what ?
I don't know the answer to that question, but have you checked to see if any significant fraction of your user's messages are actually less than 4KB? The headers alone of your query as it reached me were 3,125 bytes; the message as a whole was 4,063 bytes. Yours was a short text/plain, an increasingly rare phenomenon.
It would be great if you could point your IMAP client at the .snapshot directory and treat it just as another folder to then copy messages to the real folder.
I think all that should be required to allow that would be a script to create suitably symlinks from the user's main imap folder into the snapshot folders, no? Such could be run out of cron.
-Bennett
Hi, I'm about to start a conversion project of our University email system based on mbox folders to Courier IMAP maildir. There are about 50K users using about 250 GB of space on F840 cluster; Netapp version 6.2.2. We have been also told that as of this year each student will keep their account for life, therefore this number will keep on growing.
Wow! This is a very interesting decision. Did the powers that be consider policy problems such as email abuse? What happens if an alumnus starts using your email system to send spam? Do you delete their email account? Can they ever get it back? At least you have some leverage over enrolled students when enforcing your policies. You don't have any leverage at all over alumni, except to disable the account. Will alumni need to sign an agreement to abide by your policies?
Also, what happens if the alumni adversely affect email performance for the enrolled students, faculty, and staff? Is there a commitment to fund necessary upgrades? Is it fair to ask enrolled students to partially fund the email of alumni? If some of your funding comes from the government, is it even legal to provide this service to alumni? Is it a misuse of funds? (That is an issue here at the Univ. of Virginia.)
What sort of support are alumni entitle to? Can they call your help desk? How will that impact service for everyone else?
How are you going to handle email that just piles up when alumni never log in to read it any more? Most of our students routinely receive spam, and many are on active mailing lists. If we didn't delete their email accounts when they graduate, this stuff would just keep coming in forever.
I don't see how treating alumni and enrolled students the same can possibly be fair to the enrolled students, especially as the number of alumni keeps growing. In four years the number of alumni accounts on your email system will roughly equal the number of enrolled students, and after that, the alumni will be a growing majority.
Forgive me if I misinterpreted what you are planning to do.
I'm very concern about the number of inodes/files/volume and the filer performance.
From all the information I could find some of you said that you get about o one inode for every 32 K o can be increased to as much as 1 inode for every 4 K (maxfiles)
There was also some discussion that when you do this, the internal WAFL tables become so huge that it will impact performance.
I don't think the performance issue is a show stopper. If you had the choice of storing 250GB as 1 million files or 250GB as 100 files, which do you think requires less space and performance overhead? But you don't have that choice. Given that you need a large number of files, netapp will handle that situation at least as well, if not better, than anything else out there.
Bear in mind that a file under 64 bytes long consumes zero data blocks, but still consumes one inode. That is because up to 64 bytes of data can be stored in the inode itself. So even if you have one inode for every 4k data block, you could still theoretically run out of inodes, although it isn't likely. You can verify that short files use no blocks with the "du -k" command on a NFS client.
32 million / 50 K users gives me only about 650 files; this is clearly not enough.
Will someone having gone thru this type of a conversion give me some pointers of what to do and what not to do ???
Thanks, George
George Kahler e-mail: george@yorku.ca Sr. Systems Administrator humans: (416) 736-2100 x.22699 Computing and Network Services machines: (416) 736-5830 Ontario, Canada, M3J-1P3
Steve Losen scl@virginia.edu phone: 434-924-0640
University of Virginia ITC Unix Support
2003-02-18T14:49:19 George Kahler:
o can be increased to as much as 1 inode for every 4 K (maxfiles)
You may want to pursue that. I just checked my archive of 28,613 messages, and I've got a mean filesize of 5,892 bytes. NB that's by taking actual file sizes, not blocks used (which rounds up of course).
Of course, if you have anything _else_ besides pure maildirs on that toaster, you may have no troubles; anything else will probably bring the mean up. My overall system mean, including bin and lib dirs, and piles of rpms, logs and all, computed using blocks used (i.e. rouding up) is 30,559 bytes/file.
Oh, and I get less bloated traffic than many folks enjoy exchanging; nearly all my email is pure straight US-ASCII text/plain.
Probably be worth your while to strike a mean over your actual users' mailbox-filling habits, and find _their_ mean messagesize.
Here's a quick-n-dirty mbox mean-taker:
formail -s wc -c|perl -lne 's/\s//g;$s+=$_;$c++;END{print($s/$c)}'
formail comes with procmail. You can concatenate all your user's mboxes and pipe them into an instance of that.
-Bennett
George Kahler wrote:
Hi, I'm about to start a conversion project of our University email system based on mbox folders to Courier IMAP maildir. There are about 50K users using about 250 GB of space on F840 cluster; Netapp version 6.2.2. We have been also told that as of this year each student will keep their account for life, therefore this number will keep on growing.
I'm very concern about the number of inodes/files/volume and the filer performance.
From all the information I could find some of you said that you get about o one inode for every 32 K o can be increased to as much as 1 inode for every 4 K (maxfiles)
There was also some discussion that when you do this, the internal WAFL tables become so huge that it will impact performance.
32 million / 50 K users gives me only about 650 files; this is clearly not enough.
Will someone having gone thru this type of a conversion give me some pointers of what to do and what not to do ???
Thanks, George
George Kahler e-mail: george@yorku.ca Sr. Systems Administrator humans: (416) 736-2100 x.22699 Computing and Network Services machines: (416) 736-5830 Ontario, Canada, M3J-1P3
think about using quotas , as they allow you not only to limit space usage but also limit inodes consumption
example bellow illustrates no space limitation (-) but inode limitation : << romeo user - 1M # one billion inodes romeo user@mail - 50K # 50 hundred files allowed in the qtree mail for the user romeo
You would share this policy with your student in away like : " Oye Oye. You are asked for take care of your mail's INBOX and sub Folders as the number of your mail is limited to 100. Thank you to erase useless mail, save or forward interesting mail in order not to leave your old mail taking space in your mail account..."