When Data Joins The Dark Side - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Big Data Analytics

When Data Joins The Dark Side

A big data stockpile may contain dark data -- unstructured, unclassified information that you can't put to good use. Maybe it's time to find it.

Quick, how much of your big data is dark?

Sure, the word "dark" is open to interpretation, so let's clarify things a bit. Gartner's IT Glossary offers this definition of dark data: Information that an organization collects, processes, and stores in its day-to-day operations, but which it largely fails to use for other purposes, including analytics or business relationships.

"Similar to dark matter in physics, dark data often comprises most organizations' universe of information assets. Thus, organizations often retain dark data for compliance purposes only. Storing and securing data typically incurs more expense (and sometimes greater risk) than value," Gartner stated.

But even if you know what dark data is, managing it can be tricky, said Julie Colgan, director of information governance solutions for Nuix, an enterprise software company that helps organizations manage growing volumes of unidentified, unstructured data tucked away in archives, email and collaboration systems, hard drives, and other places.

[Not getting what you expect from your analytics initiative? See 8 Reasons Big Data Projects Fail.]

Nuix's customers include government, law enforcement, and regulatory agencies. Organizations also use the company's software for e-discovery to proactively govern their information and to seek out potential legal threats and opportunities.

"Dark data is the data that an organization retains, often unknowingly, that lacks any substantive control or classification," Colgan told InformationWeek in a phone interview.

As a result, organizations often are unable to benefit from it.

"Data is dark when we don't know it exists, when we can't find it, when we can't interpret it, and when we can't share or interface with it," said Colgan.

(Source: NASA)
(Source: NASA)

But how does data join the dark side?

"Sometimes data goes dark because we're simply too busy to deal with it, so we push it to the side and ignore it," Colgan said. "Maybe we don't have the right tools to address the scale or speed, or to shine a light on the data."

Alternatively, data can go dark when it's trapped in a repository -- a legacy archive, for instance -- that renders it difficult to access or analyze.

"We have a lot of customers interested in migrating off legacy archives," said Colgan. "They're doing so for a couple of reasons: One, a number of archives are at end of life, and (customers) want to go to a more modern platform; two, they want to migrate to the cloud."

As is often the case with big data implementations, companies may find themselves with information hoards that are needlessly large. Knowing which data to keep can prove challenging.

"They find they have more information than they need, and they want to ... make some good decisions about what to keep, how to keep it, and how to get rid of the stuff they don't need," said Colgan.

She offered this advice for companies dealing with dark data:

"Take a step back and think strategically about how information is an asset, and (how it) presents new and different kinds of risks to your organization," said Colgan. "Align that to what your risk tolerance is ... and then apply the right tools."

The goal should be to create an environment where data "isn't a constant tsunami that's drowning everyone," she added. "The old methods for managing information need to be examined and realigned."

Of course, this process includes making good decisions about "what data to keep, how to keep it, and how to get rid of the stuff you don't need," said Colgan.

Data protection perceptions seem unconnected from reality for the 437 respondents to our 2014 Backup Technologies Survey, as 36% say they're very satisfied with their backup systems even as just 23% are extremely confident in their recovery capabilities. Get the 2014 Backup Technologies Survey report today. (Free registration required.)

Jeff Bertolucci is a technology journalist in Los Angeles who writes mostly for Kiplinger's Personal Finance, The Saturday Evening Post, and InformationWeek. View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
David F. Carr
David F. Carr,
User Rank: Author
8/11/2014 | 5:15:20 PM
Information hoards that are "needlessly large"
Isn't the point of big data technology that it's possible to hoard data more greedily and tease useful information ot of it? Maybe you want to root out duplication or reduce the amount of data that adds liaibility without any compliance-oriented justification for retaining it. But if there is some potential value left in the information, don't you want to be a hoarder these days?
User Rank: Author
8/11/2014 | 11:52:14 AM
Ending the Silos
Many organizations are dark, as you describe it, because of the silos you mention. Now recognizing the value, costs, and legal protections consolidation create, many organizations are slowly but surely pulling together their data repositories. It's challenging, but the payoffs -- as those who have accomplished the task often can attest to -- are many and rich.

On the consumer side, I'm sure we can all recall instances where our data is housed multiple times within a business. Often, that results in multiple emails/calls/letters, sometimes using different information. Multiply that across millions of people and that saving alone adds up. On the legal front, not knowing what you have (and, therefore, being unable to correctly secure it at times) is a hazard for many industries.
User Rank: Author
8/11/2014 | 11:21:22 AM
E-discovery (with an eye to legal protection)  has been an issue for two decades. Are cloud storage services making it any easier to manage?
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

How SolarWinds Changed Cybersecurity Leadership's Priorities
Jessica Davis, Senior Editor, Enterprise Apps,  5/26/2021
How CIOs Can Advance Company Sustainability Goals
Lisa Morgan, Freelance Writer,  5/26/2021
IT Skills: Top 10 Programming Languages for 2021
Cynthia Harvey, Freelance Journalist, InformationWeek,  5/21/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
Planning Your Digital Transformation Roadmap
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Flash Poll