On Sat, 12 Feb 2005, Dirk Schmiedt wrote:
Hi, We have an application where we need to store between 4 to 20 million small files on the on a large drive or a 3 drive raid system.
Hi Maren
A) How small is "small"?
about 8KB to 64KB's.
All current filesystems have the ability to store very small data files without using any additional datablocks by storing the filecontent in the file-inode itself. This sizelimit varies on the different filesystems...
well we initially just set a 4gb partition for the job and at 800,000 files we ran out of inodes... bigger drives do solve this issue..
Therefore we need to know the size of your "small" files. Less than 64 Bytes? Less than x Bytes?
less than 64KB.
B) I now assume that your database produces files with "ugly" appr. 100 bytes/file ...
Filesystems are hierarchical databases - to usualy store big(!) amounts of data - using the filenames (incl. the path) as primary keys to relocate the stored data. Another focus is the multiuser management. Therefore I dare to declare, that any filesystem you might choose will be the "wrong" kind of database for the type of data (small data) your application produces.
very true.
=> Do you have the possibility to change the application?
If yes:
- Is it possible to collect the data of multiple files to single files?
well the data is several databases will millions of tables... using mysql, still i wonder if any of the other databases have created their own file system or monster database into which its stores data efficiently.
- How about changing the application to use a "real" dedicated
database for managing all these small data records?
ermm.... that is what we are doing.... all the records could have been put into a single table but then the table would have a few billion records... so we have moved the goal post and have a problem at file system by spliting the data into many tables.
There are many databases that can handle even many small entries with lower response times than hierarchical filesystems. And these database files could be stored on a NetApp Filer... :-)
aye, i wish we could just bundle a netapp... in a way it would just complicate things so much, licensing on going support etc...
Before going that path it is simpler to get a commercial database for the job that has some custom file system. Any suggestions?
If no: 3) For Linux or IRIX, I would choose XFS. It's solid like a rock and flexible like a rubber band incl. a very dynamic flexible inode-management. As long a there is some space in the filesystem left, it will automatically create new inodes if required. But there is no inbuild version control like WAFL offers. :-(
Linux is starting to starting to appear to be the choice if we don't fine a database that has a custom file/storage system.
Assuming the worst case 100 bytes/file scenario, my personal choice would be (2). "Back to the filers." ;-)
i guess my "small files" are pretty big.
thanks for get great replies to all of you.
Maren.
-------------------------------------------------------------- HKdotCOM Ltd Tel: 852 2865-4865 ext 888 Fax: 852 2865-4100 leizaola@hk.com AIM: MarenHKdotCOM ICQ: 39905706 MSN:MarenHKdotCOM -------------------------------------------------------------- Get your @hk.com email address on http://www.hk.com