I find it's really quite common to underestimate the workload involved in traversing a billion files across a few hundred tera, and how much knock on impact that cascaded down. (It's passable on a quiescent system, but can _really_ hurt when it's under load, and push your latency up quite significantly).
Particularly - a lot of the performance of storage arrays in general is down to efficient caching, and deep file traversal doesn't. So you've got a heavy 'fast as you can' read workload, that _has_ to go to back end disks, and because it's read-heavy it's a real-time time constraint.
On access or on-write scanning similarly - average figures look ok, but _peak_ latency figures start to really hurt. I mean, the way latency works - when 'congestion' is happening, load increases amplify into quite substantial latency increases, and performance _really_ starts to hurt.
Offloading 'on access' to client is about the only way to distribute this load wide enough.
Of course, in an ideal world, you'll have ample storage performance in reserve, and this will never be an issue. Maybe when we're all SSD everywhere, then I'll revise my opinion. I think that day is still a way off though....
(reply to the whole list this time :))