When Data Joins The Dark Side - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Big Data Analytics
10:19 AM
Connect Directly

When Data Joins The Dark Side

A big data stockpile may contain dark data -- unstructured, unclassified information that you can't put to good use. Maybe it's time to find it.

Quick, how much of your big data is dark?

Sure, the word "dark" is open to interpretation, so let's clarify things a bit. Gartner's IT Glossary offers this definition of dark data: Information that an organization collects, processes, and stores in its day-to-day operations, but which it largely fails to use for other purposes, including analytics or business relationships.

"Similar to dark matter in physics, dark data often comprises most organizations' universe of information assets. Thus, organizations often retain dark data for compliance purposes only. Storing and securing data typically incurs more expense (and sometimes greater risk) than value," Gartner stated.

But even if you know what dark data is, managing it can be tricky, said Julie Colgan, director of information governance solutions for Nuix, an enterprise software company that helps organizations manage growing volumes of unidentified, unstructured data tucked away in archives, email and collaboration systems, hard drives, and other places.

[Not getting what you expect from your analytics initiative? See 8 Reasons Big Data Projects Fail.]

Nuix's customers include government, law enforcement, and regulatory agencies. Organizations also use the company's software for e-discovery to proactively govern their information and to seek out potential legal threats and opportunities.

"Dark data is the data that an organization retains, often unknowingly, that lacks any substantive control or classification," Colgan told InformationWeek in a phone interview.

As a result, organizations often are unable to benefit from it.

"Data is dark when we don't know it exists, when we can't find it, when we can't interpret it, and when we can't share or interface with it," said Colgan.

(Source: NASA)
(Source: NASA)

But how does data join the dark side?

"Sometimes data goes dark because we're simply too busy to deal with it, so we push it to the side and ignore it," Colgan said. "Maybe we don't have the right tools to address the scale or speed, or to shine a light on the data."

Alternatively, data can go dark when it's trapped in a repository -- a legacy archive, for instance -- that renders it difficult to access or analyze.

"We have a lot of customers interested in migrating off legacy archives," said Colgan. "They're doing so for a couple of reasons: One, a number of archives are at end of life, and (customers) want to go to a more modern platform; two, they want to migrate to the cloud."

As is often the case with big data implementations, companies may find themselves with information hoards that are needlessly large. Knowing which data to keep can prove challenging.

"They find they have more information than they need, and they want to ... make some good decisions about what to keep, how to keep it, and how to get rid of the stuff they don't need," said Colgan.

She offered this advice for companies dealing with dark data:

"Take a step back and think strategically about how information is an asset, and (how it) presents new and different kinds of risks to your organization," said Colgan. "Align that to what your risk tolerance is ... and then apply the right tools."

The goal should be to create an environment where data "isn't a constant tsunami that's drowning everyone," she added. "The old methods for managing information need to be examined and realigned."

Of course, this process includes making good decisions about "what data to keep, how to keep it, and how to get rid of the stuff you don't need," said Colgan.

Data protection perceptions seem unconnected from reality for the 437 respondents to our 2014 Backup Technologies Survey, as 36% say they're very satisfied with their backup systems even as just 23% are extremely confident in their recovery capabilities. Get the 2014 Backup Technologies Survey report today. (Free registration required.)

Jeff Bertolucci is a technology journalist in Los Angeles who writes mostly for Kiplinger's Personal Finance, The Saturday Evening Post, and InformationWeek. View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Strategist
8/23/2014 | 2:25:08 AM
Re: Silos and lack of strategy
So true. Using data without a long term vision for where it can be used, metrics and analysis which can be incurred and a proper data strategy just means one thing - plenty of blind spots and chaos. The price for acquiring such data which isn't even stale , but dead/dark data is often paid by IT data strategy personnel (though the cost is billed on the company). You need a tool like HAVEN to make more sense of it (goo.gl/HFdxfV)
User Rank: Ninja
8/15/2014 | 11:04:21 AM
Silos and lack of strategy
As another commenter posted, silos are a serious contributing factor to organizations accumulating dark data.  However as a recent IDG SAS survey showed, the surprisingly hack lack of a data strategy plays a key role here as well. Without a serious understanding of what you want to get out of data - and an understanding of how to do it, data will fail to fully realize its potential. 


Peter Fretty
User Rank: Apprentice
8/12/2014 | 11:13:06 AM
what about common sense
If data is collected and not used, shouldn't the first reaction be to stop collecting it?

This seems to go against the Big Data "goal" of collecting everything and trying to find something (or possibly anything). But it may be a lot more costly to keep data for which there is little to no value considering this dark data may get stolen stolen (since it is dark data, would you even know the data was stolen?) resulting in potential legal fines and loss of trust (i.e. loss of customers, investors, partners).
User Rank: Moderator
8/12/2014 | 1:55:02 AM
Re: E-discovery
The challenges are obvious. Dark data does raise eDiscovery cost where the organization if in litiffation, reviewing the case can only increase eDiscovery costs. It also consumers resources in IT a great deal. This can be time consuming and stressful for IT personnel given they may have to restore or identify files which are hard to locate.
David F. Carr
David F. Carr,
User Rank: Author
8/11/2014 | 5:15:20 PM
Information hoards that are "needlessly large"
Isn't the point of big data technology that it's possible to hoard data more greedily and tease useful information ot of it? Maybe you want to root out duplication or reduce the amount of data that adds liaibility without any compliance-oriented justification for retaining it. But if there is some potential value left in the information, don't you want to be a hoarder these days?
User Rank: Author
8/11/2014 | 11:52:14 AM
Ending the Silos
Many organizations are dark, as you describe it, because of the silos you mention. Now recognizing the value, costs, and legal protections consolidation create, many organizations are slowly but surely pulling together their data repositories. It's challenging, but the payoffs -- as those who have accomplished the task often can attest to -- are many and rich.

On the consumer side, I'm sure we can all recall instances where our data is housed multiple times within a business. Often, that results in multiple emails/calls/letters, sometimes using different information. Multiply that across millions of people and that saving alone adds up. On the legal front, not knowing what you have (and, therefore, being unable to correctly secure it at times) is a hazard for many industries.
User Rank: Author
8/11/2014 | 11:21:22 AM
E-discovery (with an eye to legal protection)  has been an issue for two decades. Are cloud storage services making it any easier to manage?
The Best Way to Get Started with Data Analytics
John Edwards, Technology Journalist & Author,  7/8/2020
10 Cyberattacks on the Rise During the Pandemic
Cynthia Harvey, Freelance Journalist, InformationWeek,  6/24/2020
IT Trade Shows Go Virtual: Your 2020 List of Events
Jessica Davis, Senior Editor, Enterprise Apps,  5/29/2020
White Papers
Register for InformationWeek Newsletters
Current Issue
Key to Cloud Success: The Right Management
This IT Trend highlights some of the steps IT teams can take to keep their cloud environments running in a safe, efficient manner.
Flash Poll