I've got some questions that I'm hoping people on this list may be able to help answer. Background first.
We're working on an application that uses alot of Berkeley DB hash files, via DB_File in perl. Right now, we are looking at about 30k individual databases. Overall, the application currently hit's about 1000 of these databases a minute with a few hundred reads per use and potentially an equal number of writes. Occasionally the application does grooming on the databases, rebuilding them to reclaim space and expire unused entries. Each database will be around 5 megs.
We're testing on a pair of clustered 760s with half of the anticipated load. Each filer has 2 shelves of 18s. The load generated by the application is enough to max the disks on both filers. CPU utilization is around 40% and ops are generally under 1000. Disk read's average about 12mb/s, net out, around 5mb/s per filer.
Now, here's the question. Under berkeley db you can control the hash bucket size which will default to the filesystem block unless other wise specified. Does anyone know what the optimum bucket size would be on a NetApp?
Any help offered in tuning the application to best use the resources is greatly appreciated.
2004-02-04T17:51:50 Kelsey Cummings:
We're working on an application that uses alot of Berkeley DB hash files.
Make sure you either open source your app, or confine it to use within one single building.
Sleepycat has done very clever stuff in redefining "redistribute" so if you use Berkeley DB in an app that's used in more than one building, you're obliged to distribute your app as open source.
Sadly, oh so sadly, the Open Source Initiative has not recognized this bit of cleverness as violating their Open Source Definition, so Sleepycat's Berkeley DB license is so far still tolerated as an Open Source license. To my tastes it demeans the "Open Source" terribly.
-Bennett