Sum of numbers using MapReduce Framework

Thursday 16 April 2015

Filtering with Map-Only Tasks

Simple filter map task is so easy to implement that it needs only two Java Files.

Continuing from my last post, the same data is being used here to extract only an exact value from the file.

1) Mapper
2) Driver

Steps:
1) Mapper Class

2) Driver Class
Driver class is similar to the one we wrote in my last post except the Map class changed and the Number of Reducers set to 0

3) Output:

That's all!!!

Enjoying it!!!

Sum of Numbers From a Huge file

I am feeling great while learning Hadoop and MapReduce Framework. In this post, I am sharing my findings on my initial steps to learn the MapReduce Framework. This program is a little different than the usual WordCount example, in which it utilizes the setup and clearup methods in accomplishing the task.

Agenda:
1) Input File size is about 2GB, containing numbers in each line. This file is generated by program which generates random positive numbers.
2) Copy the input file into the HDFS system
3) Hadoop Environment: VMPlayer with Ubuntu 11.0.0 and Hadoop 0.20 installed in it.
4) Map and Reduce Program to count the sum of numbers
5) Driver Program to configure the JobClient

Steps Followed:

1) Data Dumper Program:

2) Copy the number-data.xml file from Ubuntu Desktop folder into the HDFS saip input folder

hadoop fs -copyFromLocal number-data.xml saip

3) Write the Mapper Program as follows:

4) Write the Driver Program without the Reducer as follows:

5) Start the Hadoop Framework:

6) Run the map-only task as:

7) The output of map-only task in part-m-000xx files : In my case it created 29 map files because file size is around 1.9GB. It is as good as saying 64MB * 29 = 1.9GB (where 64MB is the default block size). Each file containing one line Result XXXXXXXXXX. This is the result of the cleanup() method in Map class.

8) Now let us write the Reduce Program as follows:

9) Uncomment the Reducer related lines from Driver Program and run

10) The output of Reducer program is as follows: It produced a single file part-r-00000 file containing one Result 194340451763551429