Demand for ever-larger data warehouses and ever-faster access to the data is a worldwide phenomenon. Take the examples of Reliance Communications of India and Sweden's TradeDoubler, a pan-European digital marketing firm. Both companies have replaced legacy data warehouses with alternative technologies that have one thing in common: lower cost and better performance than conventional technologies could offer. Reliance needed a highly scalable solution for call data records, so it's building a massive store on data warehouse appliances. TradeDoubler needed faster load speeds and analytic performance, but it also wanted to spend less time rebuilding and tuning the database, so it chose a column-store database.
The Need for Scale
Data volumes are mushrooming around the globe, and particularly in the telco sector in India. "For the last few months, telecom in India has been the fastest-growing market in the world, adding about 10 million customers per month," says Raj Joshi, Vice President of Decision Support Systems at Reliance Communications, one of the country's top mobile, land line and long distance providers. "We've been adding as many as 1.5 million customers per month, and we were looking for a solution that would help us optimize storage, efficiency and cost."
After an extensive review in early 2007, Reliance chose and implemented a 60-terabyte Greenplum data warehouse appliance last summer. That deployment was successful, so it now has a 120-terabyte appliance Greenplum coming online. All 180 terabytes of capacity will be dedicated to storing and retrieving call data records (CDRs), an application that was previously supported by an Oracle data warehouse. With nearly one billion new calls made every day and government requirements to retain call records for 13 months, the 50-terabyte conventional warehouse was quickly running out of headroom.
"We chose an appliance for the CDR application because it was the fastest-growing piece of our warehouse," Joshi explains. "Access to CDRs in not very frequent, but they need to go in a big database… we needed fast loading and fast retrieval for large amounts of data."
As is common for many first-time appliance deployments, Reliance is essentially offloading a high-volume, data-mart style application from the conventional data warehouse, which continues to support analysis of subscriber demographics and market trends. "Pre-paid [calling cards] account for almost 85 percent of our business, so analysis of recharges, customer user behavior and payment behavior remains in Oracle," says Joshi. "Greenplum was really new technology for us, so the idea was that once the CDR application is proven, we could expand [use of appliances] into other areas."
Reliance is thus far pleased with the Greenplum deployment in two key respects, says Joshi: "I can't comment on our final costs, but the savings were substantial… As far as performance goes, it's about three to five times faster [than our old warehouse], so the queries that were taking a couple of hours now take 30 minutes."
The Need for Speed
Scalability was decidedly not the problem facing TradeDoubler. In fact, the Web marketing firm's warehouse was less than one terabyte, but complex analytic queries against as many as 3 billion rows of data demanded extensive aggregation. What's more, since the firm studies constantly changing clickstream data, the database had to be continually rebuilt, reindexed and tuned.
"You have to structure the database to be able to ask the questions, and that takes a lot of work," says CTO Ola Uden. "We had a one person working with the data full time, but depending on the complexity of the queries, it took anywhere from half a data to two days to get the data out."
Early this year TradeDoubler implemented the Brighthouse column-store database from InfoBright. Column-oriented databases are faster than conventional (row-oriented) databases in many analytic applications because they can query selected attributes without wading through all the non-relevant data in each row. Leading column-store databases are also designed to take advantage of commodity hardware supporting massively parallel processing. TradeDoubler is running Brighthouse on an inexpensive ($12,500) Dell server with two quad-core processors.
TradeDoubler optimizes Web marketing campaigns across Europe and Asia for more than 1,600 advertisers by analyzing Web clicks, impressions and purchases. Customers include online retailers such as Apple and Dell. Brighthouse and Pentaho BI software are serving as the engine behind TradeDoubler's TD Integral Cross-Media Marketing Platform, which is designed to "understand the complete customer journey" across search engines, affiliate sites in TradeDoubler's network and online advertising."
"If Apple is running a campaign for the iPhone, they want to look at how people ended up buying one at their site," explains Mats Johansson, a senior consultant at Lincube Group AB, which helped TradeDoubler with the Brighthouse implementation. "What did they do before they made that purchase? Did they read a review or were they responding to an ad? Which sites were they visiting and how did they arrive at the Apple store?"
TradeDouble has more than 125,000 Web sites in its network, and it tracks some 20 billion impressions, 265 million unique visitors and 12 million leads per month. The Brighthouse implementation went into production in May, and the firm now loads and rebuilds the database every day, retaining three days of network-wide clickstream data and 60 day's worth of online order information.
TradeDoubler continues to rely on Oracle for many of its transactional processing needs, but constant rebuilding and, in particular, high-volume loading necessitated an alternative approach, says Johansson. "Loading 2 billion rows a day while still maintaining performance on analytic queries would have been quite expensive," he says.
Between faster loading speeds, automated indexing, 30X data compression and faster query times, TradeDoubler is getting faster answers at a lower cost than would have been possible with conventional technology. With appliances, column-store databases and related software-hardware configurations growing in number and diversity (from small-scale to ultra-high-capacity), it looks like the days of building data warehouses from scratch are winding down all over the globe.