New Tools For Finding Data And Documents Quickly - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Software // Enterprise Applications
News
10/20/2006
10:25 AM
50%
50%

New Tools For Finding Data And Documents Quickly

Content-addressed storage technology can help businesses preserve documents and find them easily

There's been a lot of buzz in legal circles recently about United States v. KPMG. The feds accused the accounting firm of cooking up illegal tax shelters for rich clients from 1996 to 2003. What caught our eye isn't the $456 million the firm will pay or even the $2.5 billion in evaded taxes. We noticed that the case thus far has generated, in electronic or paper form, 5 million to 6 million pages of discoverable documents, of all shapes, sizes, and types. That's a prime example of why data-retention and digital-discovery requirements have lit a fire under the normally staid archival market.

Vendors are touting content-addressed storage, or CAS, as a way to make discovery requests more manageable. In a nutshell, a CAS system locates data by an array-assigned address, rather than by physical address or directory. Since the CAS device completely abstracts data from the hardware on which it resides, documents can be found based on content, rather than by storage location.

The earliest entry into this market, EMC's Centera, first released in 2002, is still the clear leader in terms of CAS-capable units, mainly because EMC was first with a strong play. Today, competitors big and small, including Caringo, Hewlett-Packard, Hitachi, IBM, Nexsan, and Sun Microsystems, are bullish on CAS. We expect every major storage vendor to provide some iteration of CAS, albeit under the guise of a "complete archive management system." Some have entries already, and we expect others to follow suit in the next 24 months.

Digital Fingerprints

A CAS system comprises storage nodes, where data is physically kept, and access nodes, where metadata and information on the data's location on the storage nodes are kept. CAS can cut down on duplication, and thus storage space requirements. A document with even a small change will be saved separately from the original copy, providing digital fingerprinting and versioned storage. Some vendors use this capability to keep only one copy of a given data set, removing the duplicates usually found on standard location-addressed storage.

The story isn't all positive: Many CAS devices have significant shortcomings. For example, metadata standardization is nonexistent. The Storage Networking Industry Association is creating a standard that will allow for the migration of XML-based metadata between different CAS systems, but those efforts are incomplete. Keep an eye on SNIA and ask your vendors about plans to implement eventual CAS standards.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Previous
1 of 2
Next
Comment  | 
Print  | 
More Insights
News
How GIS Data Can Help Fix Vaccine Distribution
Jessica Davis, Senior Editor, Enterprise Apps,  2/17/2021
Commentary
Graph-Based AI Enters the Enterprise Mainstream
James Kobielus, Tech Analyst, Consultant and Author,  2/16/2021
Slideshows
11 Ways DevOps Is Evolving
Lisa Morgan, Freelance Writer,  2/18/2021
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
2021 Top Enterprise IT Trends
We've identified the key trends that are poised to impact the IT landscape in 2021. Find out why they're important and how they will affect you.
Slideshows
Flash Poll