Sears Hadoop Plans: Check Out Data Warehousing's Future - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Software // Information Management
10:23 AM
Doug Henschen
Doug Henschen
Connect Directly

Sears Hadoop Plans: Check Out Data Warehousing's Future

Will Hadoop become the new enterprise data warehouse? Sears' CTO is not alone in seeing a shift in how we'll use relational databases.

12 Hadoop Vendors To Watch In 2012
12 Hadoop Vendors To Watch In 2012
(click image for larger view and for slideshow)
Radio did not spell the end of newspapers, nor television the end of radio, nor the Internet the end of television. But each advance fundamentally changed the use of the prior platform. And so it will be with Hadoop and relational databases.

If the example of Sears can serve as our guide, Hadoop will become a popular central corporate data repository -- perhaps even the leading data repository eventually. It will take over that role not only because it can handle huge volumes of data more cost effectively than relational databases, but also because it easily ingests varied and complex data without first conforming it to a pre-defined schema, as you have to do when using a database. You can save all your data for the long term and apply schema when you need to use it, rather than imposing a schema before it's loaded onto the platform.

At Sears, Hadoop was first deployed three years ago and it has since become the central hub of all data management activity for the retailer. CTO Phil Shelley tells InformationWeek that Hadoop is giving Sears the flexibility and scale to make use of all the company's data. "We keep all the raw, transactional data, and because there's enough horsepower in Hadoop, you can then transform it into any form you want whenever you want on they fly rather than having to create cubes or aggregations," Shelley explains.

[ Want the inside story on big data plans at Sears? Read Why Sears Is Going All-In On Hadoop. ]

Hadoop has essentially become the enterprise data store at Sears, but that's not quite the same thing as an enterprise data warehouse. The difference is analysis, some of which can be done with the batch, MapReduce processing native to Hadoop. But the retailer is still using relational databases in many situations. InfoBright's columnar database, for example, is used for fast analysis of data aggregations that used to be created -- with much IT time and expense -- as multi-dimensional OLAP cubes. Cube building is now a thing of the past. Instead, fresh data sets are moved from Hadoop into InfoBright on a daily basis.

In another example, Sears' massive Teradata deployment continue to run high-scale, mission-critical analytical applications. "Teradata is still an important platform for us whenever we need a high-speed SQL interface," explains Shelley. "That could be when we're integrating with SAS [analytics] or doing custom analytics with SQL."

That puts Teradata in the role of analytic data mart, however, as opposed to its usual place as the enterprise data warehouse that holds all important data. Nonetheless, Sears is using more Teradata than ever, says Teradata, and perhaps that's because Hadoop enables the retailer to store and retain more data than ever. Sears is now saving data that it used to throw out and it's retaining indefinitely data that it used to keep for only 90 days or two years. More data for analysis brings more analysis.

Lots of Hadoop users share Shelley's perspective on how it can become a central hub for data management -- longtime Hadoop shop JP Morgan Chase started envisioning this role years ago. In fact, at last month's Strata New York event it seemed that the focus on Hadoop has shifted. The questions are no longer "what is Hadoop" and "does it make sense for my company?" People are now asking, "do I have the people I need to run Hadoop," and "how will I analyze and make use of all that information?"

For now, moving boiled-down data sets from Hadoop into existing relational environments will be part of the answer, but that approach involves data-movement delays that plenty of practitioners would like to avoid. "The BI industry has still got its head in the sand mostly because they're all still thinking about moving and copying data," Shelley tells InformationWeek "These vendor need to get their act together and write tools that run natively on Hadoop and don't copy the data and use ETL to move it into their environment."

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
1 of 2
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Apprentice
11/16/2012 | 2:05:06 AM
re: Sears Hadoop Plans: Check Out Data Warehousing's Future
I see Hadoop as a key component of a big data analytics strategy that complements and needs to integrate with the rest of an enterprise information management infrastructure that may include legacy systems (like the mainframe), relational databases, ERP, CRM, and cloud applications, data warehouse appliances, etc. Not only are the data volumes growing exponentially but the variety of data is increasing with social media, sensor devices, call detail records, industry standards data (e.g. HL7 in healthcare, FIX, SWIFT, and market data in Financial Services, etc.), log files, and the list goes on.

It certainly makes sense to store a lot of the raw multi-structured and unstructured data in Hadoop rather than a traditional relational database. However, even if you assume over time that more and more data will be stored in Hadoop you still need to access the ever increasing variety of data from multiple organizations, residing in different systems and formats, then you need to parse and transform it on Hadoop, before you can do any useful analysis.

IG«÷m hearing from data scientists that about 80% of the work in a big data project is data integration. In fact, in one study of 35 data scientists one of them stated, G«£I spend more than half my time integrating, cleansing, and transforming data without doing any actual analysis. Most of the time IG«÷m lucky if I get to do any G«ˇanalysisG«÷ at all.G«•, (Kandel, et al. Enterprise Data Analysis and Visualization: An Interview Study. IEEE Visual Analytics Science and Technology (VAST), 2012). The need for data integration is greater today than it ever has been. The challenge is to make data integration easier and more productive on emerging technologies such as Hadoop. InformaticaG«÷s PowerCenter Big Data Edition ( provides a no-code development environment to visually design data integration flows and then execute them on Hadoop so that data scientists can spend more of their time doing analysis rather than integrating data.
Top 10 Data and Analytics Trends for 2021
Jessica Davis, Senior Editor, Enterprise Apps,  11/13/2020
Where Cloud Spending Might Grow in 2021 and Post-Pandemic
Joao-Pierre S. Ruth, Senior Writer,  11/19/2020
The Ever-Expanding List of C-Level Technology Positions
Cynthia Harvey, Freelance Journalist, InformationWeek,  11/10/2020
White Papers
Register for InformationWeek Newsletters
Current Issue
Why Chatbots Are So Popular Right Now
In this IT Trend Report, you will learn more about why chatbots are gaining traction within businesses, particularly while a pandemic is impacting the world.
Flash Poll