Data Management

Does MapReduce Signal The End Of The Relational Era?

December 4, 2008

1 Min Read

Companies such as Google, Yahoo, and Microsoft that operate Internet-scale cloud services need to store and process massive data sets, such as search logs, Web content collected by crawlers, and click-streams collected from a variety of Web services. Each of these companies has developed its own strategy to support parallel computations over multiple petabyte data sets on large clusters of computers.As I wrote last week, the Google Systems Infrastructure Team used Google's MapReduce software framework to sort an astounding one petabyte of data (10 trillion 100-byte records) on 4,000 computers in six hours and two minutes. Earlier this year, Yahoo used Hadoop, an open-source MapReduce implementation, to sort one terabyte of data on 1,000 computers in 209 seconds on a 910-node cluster. MapReduce/Hadoop is a parallel programming model where users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key.

MapReduce adoption has not been without controversy. Earlier this year, database pioneer Michael Stonebraker decried MapReduce and MapReduce clones such as Hadoop, at least from the perspective of the database community, as:

About the Author(s)

Roger Smith

Contributor

See more from Roger Smith

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

Does MapReduce Signal The End Of The Relational Era?

About the Author(s)

Editor's Choice