Can Open Source Solve Big Data For SMBs? - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

09:01 PM

Can Open Source Solve Big Data For SMBs?

As smaller companies deal with larger and larger volumes of data, open source software providers may help crack the cost equation, but not necessarily the complexity problem.

10 Tenets Of Enterprise Data Management
(click image for larger view)
Slideshow: 10 Tenets Of Enterprise Data Management
The boundaries of my own knowledge led me to call upon James Phillips, co-founder and SVP of products at Couchbase, for his thoughts on the open source data stack and its utility for SMBs. Phillips gave me a refresher course on the fundamental difference between transactional data and analytical data--while the former might eventually become the latter, the two are (or should be) housed separately for application performance and other reasons. Phillips stresses Couchbase's simplicity and said that on the transactional side of data, his company has legacy technologies such as Oracle and MySQL beat. The analytical side, however, has work to do in terms of ease of use.

"I would agree that some of the other technologies, particularly on the analytics side, are a bit more complex than perhaps some SMBs are comfortable with," Phillips said in an interview. "I think that's going to change over time, certainly, but some of those technologies have focused more on raw analytical power and raw science rather than customization and simplicity. If those markets are going to grow--and I believe they are on their side--then they're going to have to deal with simplicity issues."

There are without doubt SMBs, particularly web services companies and other tech startups, that have the requisite in-house expertise to harness that power today. But the open source data maze could send the broader SMB universe running faster than you can say "NoSQL." It requires a certain amount of knowledge simply to identify the right provider for the right need.

Take Cloudera--the company might be on the forefront of Hadoop-based data management, but it's not for every SMB. In the so-called data stack, Cloudera positions itself as a foundational platform, but it would be a poor choice for a small company not dealing with heavy-duty data volume. "If you don't have your hands on at least a couple of terabytes of data, Hadoop is probably not a good fit for you," Cloudera's Zledewski said. He added that Cloudera is investing in ease of use in a several ways, including working with MicroStrategy to allow for its visualization interfaces to be used on top of Hadoop, as well as streamlining tools for installation and ongoing management. "We try to lower the barrier for what it takes to take advantage of this system," Zedlewski said.

For data-intensive SMBs looking to add IT staff to tackle the terabytes, Smith from Revolution Analytics offers a take that might be welcome news: Current and upcoming crops of university graduates will be increasingly well-versed in open source software, which means young (read: cheap) talent is waiting in the wings. "Being able to find technical people that can work with each of the stages and phases of this data analytics stack--as opposed to having to hire an industry veteran that's been trained for over 20 years in one of these monolithic solutions--is really getting [SMBs] much more advanced talent at a much lower cost."

It may be a matter of time before open source truly emerges as a clear-cut option for smaller businesses grappling with growing data, particularly those making their first foray out of largely manual systems into true data management and analysis. Given the chance to mature and invest in ease of use, the open source players could ultimately become a major force for a wide range of SMBs, particularly those priced out by enterprise-focused vendors. But that could require some patience. Commenting on Cloudera, for example, Raden of Hired Brains said: "This is a different kettle of fish. It doesn't support interactive query, requires real programming skill, and is still pretty raw. Expect that to change, but for now, unless you have very large data requirements, it's not a solution."

In the meantime, the data isn't going to stop growing and flowing. The cost advantages of open source are evident, even if it requires an investment in new in-house skills. "There is a level of technical expertise required, but if you're at a size where you have data that needs to be analyzed through either ad-hoc query or structured reports, then you've outgrown Excel," said IDC's McDonough. "You're at a point as a company where you need to start looking at BI tool options."

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
2 of 2
Comment  | 
Print  | 
More Insights
The State of Chatbots: Pandemic Edition
Jessica Davis, Senior Editor, Enterprise Apps,  9/10/2020
Deloitte on Cloud, the Edge, and Enterprise Expectations
Joao-Pierre S. Ruth, Senior Writer,  9/14/2020
Data Science: How the Pandemic Has Affected 10 Popular Jobs
Cynthia Harvey, Freelance Journalist, InformationWeek,  9/9/2020
White Papers
Register for InformationWeek Newsletters
2020 State of DevOps Report
2020 State of DevOps Report
Download this report today to learn more about the key tools and technologies being utilized, and how organizations deal with the cultural and process changes that DevOps brings. The report also examines the barriers organizations face, as well as the rewards from DevOps including faster application delivery, higher quality products, and quicker recovery from errors in production.
Current Issue
IT Automation Transforms Network Management
In this special report we will examine the layers of automation and orchestration in IT operations, and how they can provide high availability and greater scale for modern applications and business demands.
Flash Poll