Big Data: No Hoarding Allowed - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Big Data Analytics

Big Data: No Hoarding Allowed

The best insights come from data you've just collected, not the musty bits you've saved for years, argues SumAll's CEO.

Hadoop Jobs: 9 Ways To Get Hired
Hadoop Jobs: 9 Ways To Get Hired
(Click image for larger view and slideshow.)

The save-everything mantra chanted by many big data proponents is a waste of money and resources, as organizations will gain little, if any, actionable insights from massive stockpiles of archived data. Rather, the real big data payback comes from near-real-time analysis of information as it's collected.

So says Dane Atkinson, CEO of SumAll, a three-year-old data analytics startup based in New York City. SumAll's platform takes in data from a variety of sources, including social media, email, and e-commerce, and allows companies to analyze the information right away.

Given the real-time nature of SumAll's business, perhaps it's no surprise that its CEO would preach the benefits of fast-acting data analysis. Then again, Atkinson isn't the only big data player to point out the shortcomings of information hoarding.

In a phone interview with InformationWeek, Atkinson noted that companies often warehouse big data at great expense, even when they're not sure what insights they'll gain from it. And if they don't know which questions to ask of it today, they're hopeful the astute queries will come months, or even years, down the road.

[Leave the geek-speak at the office. Learn How To Explain Big Data To A 5th Grader.]

"That's the theory. That's exactly it: 'We don't know smart questions to ask now, so we're going to keep it all so that we can ask them later,'" said Atkinson, distilling the common rationale behind data hoarding, which he considers an expensive process with a dubious ROI.

"It costs a lot of money," he said. "It costs us millions of dollars a year to store our customers' data."

But despite the expense, the popular trend is to save it all.

"It's not even a question. Every company, every Internet company, tries to store all the data they possibly can," he claimed. "They believe in this theory of big data, that it'll someday be valuable."

(Source: W.Rebel)
(Source: W.Rebel)

Atkinson wasn't suggesting that companies stop storing data altogether, but rather that they do so more efficiently and with a clearly defined strategy.

"We would highly discourage storing it in a fashion that's sort of the definition of big data -- where you have it in some SSD environment on Amazon, or on a rack of servers that are costing you a fortune -- because you're not getting value out of it," he said. "You're not asking questions because it's just too big."

Still, companies often become data hoarders.

"They're living in the hoarder's environment," said Atkinson. "They're taking in all the data and putting it into a repository."

One alternative: Rather than saving every bit, companies should determine the questions they want to ask of their data, and then store the indexes they really need, a move that "will take your data down by many factors," he claimed.

Take a retail business, for instance.

"You may not need to have every second's worth of transactional history over the last four years, but it's probably pretty handy to know how [each] day went," said Atkinson. "So rolling up those 60 minutes into an hour metric [will] give your team really good guidance on the trends and patterns they want to see."

Rather than storing, say, the 2 billion transactions your business did in the past two years, save an index that tallies the hourly transaction totals during that period, he added.

This approach can greatly reduce the size of your data hoard -- "gigabytes versus terabytes," claimed Atkinson.

Again, however, he finds few businesses are slimming their data stockpiles.

"It's only the really smart companies that have started to pare that down," said Atkinson. "They may have the hoarder's closet somewhere, but they've also made a new [data] store that's much more efficient, that tries to answer smart questions and not just grab hold of everything."

InformationWeek's June Must Reads is a compendium of our best recent coverage of big data. Find out one CIO's take on what's driving big data, key points on platform considerations, why a recent White House report on the topic has earned praise and skepticism, and much more.

Jeff Bertolucci is a technology journalist in Los Angeles who writes mostly for Kiplinger's Personal Finance, The Saturday Evening Post, and InformationWeek. View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
Thomas Claburn
Thomas Claburn,
User Rank: Author
7/7/2014 | 1:55:27 PM
Re: Theory vs. reality
If only someone could convince the NSA of the merits of not hoarding data.
Lorna Garey
Lorna Garey,
User Rank: Author
7/7/2014 | 1:47:24 PM
Theory vs. reality
It's all great in theory. However, to save selectively requires effort and will -- data classification programs, someone to decide to delete X set and take the fall if it's needed someday, etc. Meanwhile, storage is cheap and getting cheaper.
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

11 Things IT Professionals Wish They Knew Earlier in Their Careers
Lisa Morgan, Freelance Writer,  4/6/2021
Time to Shift Your Job Search Out of Neutral
Jessica Davis, Senior Editor, Enterprise Apps,  3/31/2021
Does Identity Hinder Hybrid-Cloud and Multi-Cloud Adoption?
Joao-Pierre S. Ruth, Senior Writer,  4/1/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
Successful Strategies for Digital Transformation
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Flash Poll