Snowflake Computing, led by Bob Muglia, former senior VP of Microsoft's Server & Tools Division, announced Tuesday that it has detached the data warehouse from its typical on-premises location and set it into the cloud.
When built to operate in the cloud, a data warehouse can take on big data-handling characteristics for both structured and unstructured data. It can take advantage of the cloud's elasticity for big analysis jobs, store data on the inexpensive cloud volumes, then shut itself down at the end of the day.
That, in a nutshell, describes some of the characteristics of the Snowflake Elastic Data Warehouse, designed to run on Amazon Web Services and potentially other cloud architectures. There are already data warehouses available in the cloud, but most of them, with the exception of Amazon's Redshift, IBM's BLU Acceleration for Cloud, and Microsoft Azure's Data Factory, were not designed to take advantage of all the cloud's characteristics.
Snowflake has written its own system to handle unstructured data on a massively parallel processing cluster that can be spun up in the cloud on demand. But don't call it another NoSQL system. Snowflake's engineering team has watched the NoSQL systems try to layer in SQL query capabilities, and concluded those systems haven't gotten as far in employing SQL as some of their early adopters hoped.
"Why not take a SQL database system and extend it to support NoSQL data? That's the contrarian element of what we've done," Muglia said in an interview.
[Will this be the year of Hadoop? InformationWeek asks the question.]
"We built this on a very different architecture than a relational system or any of the Hadoop systems," such as Hortonworks or Cloudera, Muglia continued. It allows multiple data warehouse tasks to be processed at the same time, provided they involve mainly data reads, with few data writes, as most data warehouse tasks do. Scaling the system to do multiple simultaneous tasks is part of its design, he said.
The design doesn't let writes block reads, meaning an analytical process that's underway will be completed by the data that it started with, even though some records in the underlying data may have changed before that process was completed. In that aspect, it resembles the NoSQL systems that practice eventual consistency instead of relational's strict ACID consistency. But if the data warehouse experiences no writes on the data in use, it's functioning as a typical relational system.
Snowflake isn't a data warehouse of big data dimensions or routine enterprise data dimensions. Rather, it's a virtual data warehouse that will be sized to match the job sent to it. When the analytical tasks are finished, the warehouse shuts itself off to save overhead. "In other cloud data warehouses, you would have to unload the data to turn it off and then reload it [to use it again]," he said. Snowflake avoids that data movement task.
Although Snowflake runs on AWS at its US West facility in Oregon, customers may use Snowflake without an AWS account. They also don't need to understand the ins and outs of Amazon virtual machine selection. Customers deal with a service layer provided by Snowflake and create a virtual data warehouse when they wish to load their data. "They don't see AWS," Muglia noted.
Customers with their own AWS accounts may use them to load their data directly into S3, and Snowflake will copy it into a virtual data warehouse for them. But most customers who turn to Snowflake will do so to avoid the data-handling and data-management tasks that accompany data warehouse use in the cloud. "We went to great lengths to remove the need for customer care and management of the data."
Hadoop can ingest massive amounts of machine data, and then sort and analyze it to produce data in a more structured form. But Hadoop clusters are expensive to set up and operate, claimed Jon Bock, Snowflake's VP of product, in an interview. Snowflake can recognize and assemble metadata on machine data, saving it in a "schema-less way," he said. "We manage the metadata updates and tuning," he said.
The customer is then able to examine the data that he's most interested in by submitting a query, for example, against "a few hundred gigabytes of data in a 100-TB table. This scenario is a killer scenario," he claimed, made possible by Snowflake's cloud-based architecture.
Snowflake offers a virtual data warehouse at $2 per Snowflake credit, which amounts to one virtual CPU running for an hour. A 32-CPU double-extra-large virtual data warehouse running for an hour would cost $64.
Snowflake is trying to set a new category, a cloud-native SQL system extended into unstructured data use. Data warehouse and NoSQL system choices already abound in the cloud, and the competition will be keen. Snowflake came out of stealth last October and now has perhaps 12 months to get more than just a foot in the door before the choices offered by the NoSQL, Hadoop, and traditional data warehouse systems operating in the cloud prove overwhelming.
Muglia brings impressive marketing and management credentials to the challenge. But over the next year, Snowflake's staff of 75 people in San Mateo, Calif., will have their work cut out for them. It will have to persuade enterprise skeptics that its category exists, has the legs to endure, and can save its customers pain and money as they pursue their analytics goals.Charles Babcock is an editor-at-large for InformationWeek and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive ... View Full Bio