The figure below shows the various steps that the Hadoop MapReduce framework takes after your map function emits a key/value output record. Please note that this figure represents what’s happening with Hadoop versions 1.x and earlier - in Hadoop 2.x there have been some changes which will be discussed in a future blog post.
My book Hadoop in Practice (Manning Publications) in chapter 6 discusses how some of the configuration values in the figure should be tweaked when you start working with mid to large-size Hadoop clusters.
About the author
Alex Holmes is a senior software engineer with over 15 years of experience developing large scale distributed Java systems. Since 2008 he has gained expertise in using Hadoop to solve Big Data problems across a number of projects. He is the author of Hadoop in Practice, a book published by Manning Publications. He has presented at JavaOne and Jazoon.
RECENT BLOG POSTS
Understanding how Parquet integrates with Avro, Thrift and Protocol Buffers
Parquet offers integration with a number of object models, and this post shows how Parquet supports various object models.
Using Oozie 4.4.0 with Hadoop 2.2
Patching Oozie's build so that you can create a package targetting Hadoop 2.2.0.
Hadoop in Practice, Second Edition
A sneak peek at what's coming in the second edition of my book.
Using Hadoop 2.2 as a sink in Flume 1.4
Working around the protobuf 2.5 dependency introduced by Hadoop 2.2.
Simplifying secondary sorting in MapReduce with htuple
Introducing htuple, an open-source project to simplify secondary sorting in MapReduce.