Big Data: The Early Days Are Over - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Software // Information Management
12:40 PM
Doug Henschen
Doug Henschen
Connect Directly

Big Data: The Early Days Are Over

Netezza, Teradata, Aster Data, Datameer and IBM data warehousing customers share their stories.

At least five innovative data warehousing practitioners have stepped up to share their stories for our "mastering big data" feature article planned for August 9. Their accounts show that application-specific needs are diverse, making generic speed, feed and TPC-H benchmark claims all the more irrelevant.

I'll get to the list of my latest customer interviews in a moment, but first a refresher. As I detailed in this column, the big-data era isn't new. Despite claims that the market is suddenly red-hot (now that the big vendors have finally responded), data volumes have been steadily growing for years.

Pioneering independent vendors have led the way toward highly scalable and performance-oriented approaches including massively parallel processing (MPP), column-store databases, in-database analysis and, more recently, NoSQL approaches. Going back to the well of stories published at in recent years, consider the examples of Sweden's TradeDoubler and India's Reliance Communications, shared in this story posted in June 2008.

TradeDoubler is a pan-European new-media marketing firm that needed faster load speed and analytic performance than it could achieve in an existing Oracle deployment. The company chose Infobright, which offers a column-store database that runs on commodity symmetric multiprocessor (SMP) hardware -- TradeDoubler chose a $12,500 Dell server that's probably much cheaper today.

In June '08, TradeDoubler had more than 125,000 Web sites in its network and was tracking 20 billion ad impressions, 265 million unique visitors and 12 million leads per month. The mart retains only three days' worth of clickstream data and 60 days' worth of aggregated online order data, so it was actually less than a terabyte in size. But with rapid data turnover, TradeDoubler was loading 2 billion rows of data per day, and it was hitting a wall.

"We had a one person working with the data full time, but depending on the complexity of the queries, it took anywhere from half a data to two days to get the data out," explained CTO Ola Uden.

TradeDoubler was able to load, rebuild and query the Infobright database all within the same day. The gains were due partly to the column-store compression (said to be 30 times that of a relational database) and partly due to the fact that Infobright auto indexes and doesn't need the partitioning and tuning required to make relational databases perform. (Infobright says its database requires up to 90% less admin work than Oracle, Microsoft SQL Server or IBM DB2 and is half the cost in terms of licensing and storage requirements.)

TradeDoubler's example is one of big-data loading and turnover rather than sheer scale, and it's common workload requirement in Web clickstream analysis. TradeDoubler could have easily built a larger-scale, higher-performance Oracle-based warehouse (even before the fall 2009 introduction of Oracle Exadata V2), but Uden said the costs would have been much higher than its Infobright investment.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
1 of 3
Comment  | 
Print  | 
More Insights
10 Ways to Transition Traditional IT Talent to Cloud Talent
Lisa Morgan, Freelance Writer,  11/23/2020
Top 10 Data and Analytics Trends for 2021
Jessica Davis, Senior Editor, Enterprise Apps,  11/13/2020
Can Low Code Measure Up to Tomorrow's Programming Demands?
Joao-Pierre S. Ruth, Senior Writer,  11/16/2020
White Papers
Register for InformationWeek Newsletters
Current Issue
Why Chatbots Are So Popular Right Now
In this IT Trend Report, you will learn more about why chatbots are gaining traction within businesses, particularly while a pandemic is impacting the world.
Flash Poll