This may be a fruitless search for info but i'm interested in hearing about how folks are handling disk usage stats on large quantities of data (say 2 or more TB). Of particular interest to me are environments where there are lots and lots of small files rather than a modest amount of large ones. Knowing that the network wire is usually the bottleneck, we are running our du's from an admin host on the same isolated 100 bt server network as our 6 filers. But- because of a known incompatibility between the network card on our admin host and the Bay switch we're using, the card can only sucessfully do half-duplex, not full. Due to a lack of other suitable hosts to run these reports from, I now turn my attention to this list for some collective brainstorming.
Thanks, Derek Kelly
----- Original Message ----- From: Derek Kelly derek.kelly@genomecorp.com To: toasters@mathworks.com Sent: Tuesday, February 29, 2000 7:57 AM Subject: Running du's
This may be a fruitless search for info but i'm interested in hearing about how folks are handling disk usage stats on large quantities of data (say 2 or more TB). Of particular interest to me are environments where there are lots and lots of small files rather than a modest amount of large ones. Knowing that the network wire is usually the bottleneck, we are running our du's from an admin host on the same isolated 100 bt server network as our 6 filers. But- because of a known incompatibility between the network card on our admin host and the Bay switch we're using, the card can only sucessfully do half-duplex, not full. Due to a lack of other suitable hosts to run these reports from, I now turn my attention to this list for some collective brainstorming.
Give every user a quota, but set the quota so outrageously high they won't encounter it. Then the quota report will give you instant usage for any user, and it will count everything on that filesytem, not just the stuff in his home directory.
Bruce
Hello,
We were gathering similar information here with a nightly du. It got to where it would run for 6 hours straight collecting the info, and if it hadn't finished by the time the engineers came in the Netapp would be slow and the read cache was likely renedered useless. Du seems to read pretty much everything on disk under the filesystem you set it on. We came up with a method using LSF where we break off subdirectories under the one to be du'd and each of those is a separate LSF submission, then everything is summed up at the end. Using this we could hammer a Netapp from 60+ Suns for 20 minutes to complete the du. I did not feel comfortable with it. You havbe probably noticed though that qtrees can be made to report disk usage by UID, using the /home example syntax. What we do now is put EVERYTHING into a qtree, even if we dont care about space usage, and quota it; this allows us to do quota reports and see actual disk usage, and where it's important to do it by UID we add that parameter to the quotas entry. This mimics much of the functionality of the du but is completed in milliseconds. This obviously won't let you count down by directory usage within qtrees, but you can probably work around that with clever use of multiple qtrees and automount.
Justin Acklin
Derek Kelly wrote:
This may be a fruitless search for info but i'm interested in hearing about how folks are handling disk usage stats on large quantities of data (say 2 or more TB). Of particular interest to me are environments where there are lots and lots of small files rather than a modest amount of large ones. Knowing that the network wire is usually the bottleneck, we are running our du's from an admin host on the same isolated 100 bt server network as our 6 filers. But- because of a known incompatibility between the network card on our admin host and the Bay switch we're using, the card can only sucessfully do half-duplex, not full. Due to a lack of other suitable hosts to run these reports from, I now turn my attention to this list for some collective brainstorming.
Thanks, Derek Kelly