At the heart of big data is the search for "insight" -- some correlation or finding that eludes the seeker until he or she adds another terabyte or 10 of data, just in case it is lurking there.
At a certain point, the law of diminishing returns has to kick in. Adding another 100TB becomes redundant.
Vendors exercise their right to remain silent when asked, "How much data is too much?"
Big data skeptics don't have a precise answer, either. But they are more likely to speak of the limitations of Big data than to shout out its promise. They act as the agnostics questioning the IT theology of the Hadoop evangelists.
For Cathy O'Neil, who holds a doctorate in mathematics from Harvard University and has worked in academia and the private sector, the issue is less about the law of diminishing returns and more about people not understanding the data.
The technology is "encouraging people to use algorithms they don't understand," O'Neil said in an interview. "You don't need a lot of data [to not] know what you are doing."
O'Neil's skepticism is well grounded in her experiences, since her career wound its way through academia, to Wall Street, and then to the New York City startup scene. She has seen the plain gap between the technologists who craft the algorithms and the business people who rely on them.
Data is just a way of codifying information, O'Neil explained. Any data gathered should be relevant to a problem, otherwise useless data clouds the results of a query.
"If there are too many degrees of freedom, you are begging for a spurious correlation," O'Neil said.
"A good data scientist is a data skeptic and is pushing against group think," O'Neil continued. "Know what you don't know. It's hard." The business side "wants to come out with positive news," she continued. "What if you are wrong? Do we have a backup plan? Can we test against ground truth?"
O'Neil is in good company as she tries to balance out the various needs of big data.
"My joke is that the biggest innovation [in big data] was when Excel moved from 64,000 to 1 million rows," quipped Caribou Honig, a founding partner of venture capital firm QED Group, which is based in Alexandria, Va.
There are uses for big data in fields like genomics, Honig noted. But there are "tons of high impacts that companies can drive from small data techniques," he said. "Big data methods are substituting for actually thinking through the problem."
"I'd rather have five orthogonal modest data sets than one ginormous data set along a single axis," Honig added. That is where the law of diminishing returns kicks in.
Like any buzzword, big data passes through the stages of the Gartner hype cycle: Promise, excitement, oversell, trough of disillusionment, then some discovery of practical usage, Honig said. People are filtering out the promises, and "using big data to make a difference, not because we can."
But how big is big data?
It depends on who you ask. IDC reported November 24 that business analytics spending would reach $58.6 billion by the end of the year, and it would grow to $101.9 billion by 2019.
However, in Honig's view, the big data of five years ago is not big data today. New tools and techniques are making it possible to analyze big data sets that were simply too big five years ago. "The goal is constantly moving," he said.
There are three constraints to big data: It is hard to use, it is hard to find, and it is hard to find people who have the skills and judgment to use it, observed Andrew Horne, practice leader at CEB, a "best practice" insight and technology company.
About 62% of all people who work on big data solutions lack the skills and judgment to use the data, Horne said. It's like giving an unlicensed driver a powerful sports car -- the person doesn't know how to drive, and the extra horsepower is wasted.
CEB had surveyed about 5,000 employees at 30 companies to come up with that finding several years ago, Horne said. Since then, any gain in knowledge by the pool of users has been offset by the increasing complexity of the tools being developed to wrangle big data.
"There needs to be something in between," Horne said. There has to be enough confidence in the data, but also the ability to step back and use judgment. It is also difficult to find such people, since the search is a task that falls between the departmental cracks of the typical corporation, he continued.
Adding to the problem is the gap between the trainers and the trainees.
"When you bring in a new big data tool, you need to bring in the people on the data as well," Horne said.
[What does Nate Silver think of the presidential race? InformationWeek finds out.]
Data scientists should bridge the gap between the vendors and the users, because they know their way around the data. The way you access the data is the way you get the added value out of the data and get good results, Horne continued.
Data is not always in one place and may not be labeled consistently.
The challenge is to understand the quality of the data and determine what can be done with it. "You are helping people find data," Horne said.
**New deadline of Dec. 18, 2015** Be a part of the prestigious InformationWeek Elite 100! Time is running out to submit your company's application by Dec. 18, 2015. Go to our 2016 registration page: InformationWeek's Elite 100 list for 2016.William Terdoslavich is an experienced writer with a working understanding of business, information technology, airlines, politics, government, and history, having worked at Mobile Computing & Communications, Computer Reseller News, Tour and Travel News, and Computer Systems ... View Full Bio