Top Reasons Big Data in the Cloud Is Raining on On-Premise - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management
Commentary
5/30/2019
07:00 AM
Alex Gorelik, Founder and CTO, Waterline Data
Alex Gorelik, Founder and CTO, Waterline Data
Commentary
50%
50%

Top Reasons Big Data in the Cloud Is Raining on On-Premise

The marketplace is finally flush with analytics-specific services that deliver on the cloud's promise of reduced cost and complexity, and greater agility.

According to analysts, the cloud revolution is well underway. Synergy Research says cloud services are eating into on-premise technology growth. Forrester says cloud computing is “coming of age as the foundation for enterprise digital transformation” in its Predictions 2019: Cloud Computing report.

However, while companies have spent the last few years shifting a wide variety of IT components to the cloud, they have been much slower to move big data services away from their internal infrastructures. Early adopters of Hadoop and other large-scale data analytics technologies had to keep things in-house because these were essentially still experimental technologies.

Now, those companies that are starting their analytics forays are finding that Hadoop is simply too damn hard, and cloud vendors have come a long way with their data services. Take it all together and companies are finding that the cloud better suits their big data needs for the following reasons:

The physical implementation of a cluster is too much effort

Why buy a cluster of servers when you can have AWS or Azure and spin up a bunch of them for you? As is the case with all cloud services, you don’t have to order the hardware, nor do you need to power or even cable them up. Most of the time just constructing the physical environment alone is hard enough, not to mention getting the actual software up and running.

Skills shortage

This is a major problem that continues to plague big data. Cloud vendors are continually chipping away at big data’s ease-of-use problem by providing more automation. With the ability to automatically spin massive computing clusters up and down, cloud services suppliers are significantly reducing the need for people who have deep expertise in running them, which is important because these specialists remain hard to find.

Reduced risk

One huge advantage of the cloud, especially for big data implementations, is that they dramatically mitigate risk. You don’t know up front if your data will contain great revelations. But with cloud vendors, you can spin up a cluster, do some work and then spin it back down if you can’t unearth insights of any value, all without incurring much overall project risk. Better yet, if you do find something potentially game-changing in your data, you can then quickly spin up more systems to scale your project without spending time and money purchasing and implementing systems and software.

Of course, scaling up and down does not work for all use cases. Sometimes you must ramp up systems in the cloud and keep them running due to the nature of the project or the data. Nonetheless, it’s a lot easier to get that done in the cloud, which contributes greatly to risk reduction.

Incremental cost vs. big up-front investments

Directly related to the risk point above is the associated cost. Big data–related cloud deployments allow consumers to pay only for the services they use. The good news is that if your experimental project yields little value, your losses will be reduced significantly, assuming you fail fast. By contrast, your initiative will be an expensive failure if you were to buy all the equipment only to see your project get shut down.

Elasticity

The elasticity of the cloud allows faster time to insight. When you build a physical cluster, you are limited in how much processing you can do. A massive analytics job could take 10 hours on a 100-node cluster. With the cloud, for the same price, you can spin 1,000 nodes to run your job in an hour.

Elasticity is also key to helping organizations share massive data sets. Moving large data sets around is always a challenge. Even sharing them within an organization can be problematic because adding new users introduces load on a system. For example, if business unit A wants access to business unit B’s data, there might not be enough compute power to support more users. When the data is sitting in the cloud, it’s much easier to add capacity without having to duplicate the data. (Even if data needs to be duplicated, that process can happen quickly and easily in the cloud.)

Big data may have been late to the party, but the marketplace is finally flush with analytics-specific services that deliver on the cloud’s promise of reduced cost and complexity, and greater agility. 

Alex Gorelik is author of O’Reilly Media's “The Enterprise Big Data Lake: Delivering the Promise of Big Data and Data Science”, and the founder and CTO of data cataloging company Waterline Data. Prior to Waterline Data, Gorelik served as senior vice president and general manager of Informatica’s Data Quality Business Unit, driving R&D, product marketing and product management for an $80 million business. He joined Informatica from IBM, where he was an IBM Distinguished Engineer for the Infosphere team. IBM acquired Gorelik’s second startup, Exeros (now Infosphere Discovery), where he was founder, CTO and vice president of engineering. Previously, he was cofounder, CTO and vice president of engineering at Acta Technology, a pioneering ETL and EII company, which was subsequently acquired by Business Objects.

 

The InformationWeek community brings together IT practitioners and industry experts with IT advice, education, and opinions. We strive to highlight technology executives and subject matter experts and use their knowledge and experiences to help our audience of IT ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Commentary
Will AI and Machine Learning Break Cloud Architectures?
Lisa Morgan, Freelance Writer,  6/10/2019
Slideshows
9 Steps Toward Ethical AI
Cynthia Harvey, Freelance Journalist, InformationWeek,  5/15/2019
Commentary
Humans' Fascination with Artificial General Intelligence
Guest Commentary, Guest Commentary,  6/6/2019
White Papers
Register for InformationWeek Newsletters
2019 State of DevOps
2019 State of DevOps
DevOps is needed in today's business environment, where improved application security is essential and users demand more applications, services, and features fast. We sought to see where DevOps adoption and deployment stand, this report summarizes our survey findings. Find out what the survey revealed today.
Video
Current Issue
A New World of IT Management in 2019
This IT Trend Report highlights how several years of developments in technology and business strategies have led to a subsequent wave of changes in the role of an IT organization, how CIOs and other IT leaders approach management, in addition to the jobs of many IT professionals up and down the org chart.
Slideshows
Flash Poll