There are several solutions on the market that can provide data classification and management. I work for a company, Intermine, Inc. that has been providing a software solution for data classification and management since 1999. Having spent several years at Network Appliance myself I understand the challenges of managing a growing distributed environment.
There is an iterative process that needs to be put in place for managing the environment, including:
1) Analysis of the information in place, including: A) Security for the information - ownership and access to the files
2) Improvement of the environment: A) Cleanup and archival or deletion of information - not just duplicates (each organization defines duplicates a bit differently; it is not just as simple as saying there are two copies of the same file.) B) Capacity recovery
3) Control of the information A) Policy creation B) Policy Management C) Data Migration - tiered storage
4) Measurement A) Understanding consumption B) Monitoring change
Based on our working with 100's of organizations we have found that each stage in the process takes time and many organizations want to jump right to the migration or archival step without taking the time to understand their data and make appropriate decisions that will ultimately affect the migration-archival solution. The goal of the organization should be to reduce the risk, cost, and complexity of information at each stage, setting the goals as a group, including the information owners and information managers.
This is as iterative process because organizations grow and change over time and the information (data) within an organization changes as well.
Brett P. Cooper Director Intermine, Inc. Brett.Cooper@Intermine.com