Amazon Debuts Low-Cost, Big Data Warehousing - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Software // Information Management
02:30 PM
Connect Directly

Amazon Debuts Low-Cost, Big Data Warehousing

Amazon Redshift service promises ten times faster query performance than conventional on-premises data warehouses, at one-tenth the price.

Amazon Web Services (AWS) on Wednesday announced Amazon Redshift, a cloud-based data warehouse service that it says will deliver better scalability and performance than conventional on-premises data warehouses at dramatically lower costs.

"We did the math and found that it generally costs between $19,000 and $25,000 per terabyte per year, at list prices, to build and run a good-sized data warehouse on your own," stated AWS Evangelist Jeff Barr in a blog on the announcement. "Amazon Redshift, all-in, will cost you less than $1,000 per terabyte per year."

Promising more than a cost advantage, Amazon said its managed service approach also liberates data warehouse administrators from the tasks of monitoring, tuning, doing backups, patching software and recovering from faults. Users launch and manage Redshift nodes and clusters from the AWS Management Console, and Amazon said they can start with a few hundred gigabytes and scale up to more than a petabyte.

Redshift is based on relational database technology, so it uses SQL as its query language and is compatible with existing BI tools. It's pretty clear that the database in question is ParAccel, as Amazon is an investor in that company and statements about Redshift acknowledge licensing key technology from the company.

[ Want more on ParAccel, the database behind Redshift? Read ParAccel Jumps On Analytics Bandwagon. ]

ParAccel's database includes advanced features such as columnar data storage and advanced compression, but these are also offered by competitors including EMC Greenplum, HP Vertica and Teradata, and they are promised in the next release of Oracle Database. Despite Amazon's "ten times faster" claim, performance will clearly vary depending on the workload and the "conventional database" point of comparison.

The distinction between the previously available Amazon Relational Database Service (RDS) and Redshift is that the latter is exclusively for warehousing and analytics (as opposed to transactional database uses) and is capable of big-data scale. "RDS is based on Microsoft SQL Server, Oracle and MySQL, and those aren't systems that are designed to do petabyte-scale data warehousing," said Jaspersoft's Karl Van den Bergh, VP of product and alliances. Jaspersoft is one of two initial business intelligence partners on Redshift, along with MicroStrategy, though Amazon said that other BI partners will soon follow.

Despite the potential for big data analysis, Amazon seemed intent to highlight the potential for small and midsize companies to get into data warehousing at a very low cost. Customers can spin up two node types, including either 2 terabytes or 16 terabytes of compressed customer data per node. Pricing starts at $0.85 per hour for a 2-terabyte data warehouse. Reserved-instance pricing lowers the price to $0.228 per hour, or under $1,000 per terabyte, per year, according to Amazon.

"Like anything that Amazon does, they're disrupting the market and offering something that nobody else has been able to offer from a cost-value perspective," said Van den Bergh. "This is a big deal for the data warehousing space, so it will be interesting to see how much uptake it gets."

One thing Amazon doesn't address in detail on its Redshift site is just how companies large and small will upload and synchronize their data with Redshift. Uploading data from one source isn't complicated, but the delays and complexities of data movement multiply as the number of sources increases. Presumably, BI systems will also have to operate in the cloud in order to avoid the potentially time-consuming step of moving data back and forth between on-premises systems and the cloud.

Amazon representatives were not available for comment at press time, but InformationWeek will follow up with deeper analysis of Redshift capabilities and how it might impact the data warehousing industry.

Predictive analysis is getting faster, more accurate and more accessible. Combined with big data, it's driving a new age of experiments. Also in the new, all-digital Advanced Analytics issue of InformationWeek: Are project management offices a waste of money? (Free registration required.)

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Apprentice
5/13/2015 | 12:38:59 PM
I've heard about a lot of new warehouse technology that pretty much allows access to a perfect knowledge of what is in a warehouse at any given moment. At the end of the day, this benefits the company by benefitting the customer. If Amazon truly can be "ten times faster", then a lot of products are going to get to people a lot faster. Thanks for a good read!
Mike Lamble
Mike Lamble,
User Rank: Apprentice
11/30/2012 | 10:32:17 PM
re: Amazon Debuts Low-Cost, Big Data Warehousing
Amazon's Redshift announcement validates that enterprises are ready for cloud-based big data warehousing solutions. XtremeData, also available on Amazon as well as other clouds, is targeted for organizations that need a massively scalable DBMS solution for mixed read and write workloads, for example, with serious ELT. Redshift (a column-store licensed from ParAccel) is well-suited for read-only data marts of all sizes. The market is rapidly moving to a tipping point where the specialized solutions available on premise are becoming available on the cloud, Amazon and others.
Top 10 Data and Analytics Trends for 2021
Jessica Davis, Senior Editor, Enterprise Apps,  11/13/2020
Where Cloud Spending Might Grow in 2021 and Post-Pandemic
Joao-Pierre S. Ruth, Senior Writer,  11/19/2020
The Ever-Expanding List of C-Level Technology Positions
Cynthia Harvey, Freelance Journalist, InformationWeek,  11/10/2020
White Papers
Register for InformationWeek Newsletters
Current Issue
Why Chatbots Are So Popular Right Now
In this IT Trend Report, you will learn more about why chatbots are gaining traction within businesses, particularly while a pandemic is impacting the world.
Flash Poll