New Tools For Finding Data And Documents Quickly
Content-addressed storage technology can help businesses preserve documents and find them easilyThere's been a lot of buzz in legal circles recently about United States v. KPMG. The feds accused the accounting firm of cooking up illegal tax shelters for rich clients from 1996 to 2003. What caught our eye isn't the $456 million the firm will pay or even the $2.5 billion in evaded taxes. We noticed that the case thus far has generated, in electronic or paper form, 5 million to 6 million pages of discoverable documents, of all shapes, sizes, and types. That's a prime example of why data-retention and digital-discovery requirements have lit a fire under the normally staid archival market.
Vendors are touting content-addressed storage, or CAS, as a way to make discovery requests more manageable. In a nutshell, a CAS system locates data by an array-assigned address, rather than by physical address or directory. Since the CAS device completely abstracts data from the hardware on which it resides, documents can be found based on content, rather than by storage location.
The earliest entry into this market, EMC's Centera, first released in 2002, is still the clear leader in terms of CAS-capable units, mainly because EMC was first with a strong play. Today, competitors big and small, including Caringo, Hewlett-Packard, Hitachi, IBM, Nexsan, and Sun Microsystems, are bullish on CAS. We expect every major storage vendor to provide some iteration of CAS, albeit under the guise of a "complete archive management system." Some have entries already, and we expect others to follow suit in the next 24 months.
Digital Fingerprints
A CAS system comprises storage nodes, where data is physically kept, and access nodes, where metadata and information on the data's location on the storage nodes are kept. CAS can cut down on duplication, and thus storage space requirements. A document with even a small change will be saved separately from the original copy, providing digital fingerprinting and versioned storage. Some vendors use this capability to keep only one copy of a given data set, removing the duplicates usually found on standard location-addressed storage.
The story isn't all positive: Many CAS devices have significant shortcomings. For example, metadata standardization is nonexistent. The Storage Networking Industry Association is creating a standard that will allow for the migration of XML-based metadata between different CAS systems, but those efforts are incomplete. Keep an eye on SNIA and ask your vendors about plans to implement eventual CAS standards.
We welcome your comments on this topic on our social media channels, or
[contact us directly] with questions about the site.

1 of 2

More Insights