Home » MapReduce

Tag Archives: MapReduce

Hadoop Sequence File Example

In the article we will have a look at Hadoop Sequence file format. Hadoop Sequence Files are one of the Apache Hadoop specific file formats which stores data in serialized key-value pair. We have look into details of Hadoop Sequence File in the subsequent sections. 1. Introduction Apache Hadoop supports text files which are quite commonly used for storing the ...

Read More »

How Does Hadoop Work

Apache Hadoop is an open source software used for distributed computing that can process large amount of data and get the results faster using reliable and scalable architecture. Apache Hadoop runs on top of a commodity hardware cluster consisting of multiple systems which can range from couple of systems to thousands of systems. This cluster and involvement of multiple systems ...

Read More »

The Hadoop Ecosystem Explained

In this article, we will go through the Hadoop Ecosystem and will see of what it consists and what does the different projects are able to do. 1. Introduction Apache Hadoop is an open source platform managed by Apache Foundation. It is written in Java and is able to process large amount of data (generally called Big Data) in distributed ...

Read More »

Prerequisites for Learning Hadoop

In this article, we will dig deep to understand what are the prerequisites of learning and working with Hadoop. We will see what are the required things and what are the industry standard suggested things to know before you start learning Hadoop                   1. Introduction Apache Hadoop is the entry point or ...

Read More »

Hadoop Mapreduce Combiner Example

In this example, we will learn about Hadoop Combiners. Combiners are highly useful functions offered by Hadoop especially when we are processing large amount of data. We will understand the combiners using a simple question. 1. Introduction Hadoop Combiner class is an optional class in the MapReduce framework which is added in between the Map class and the Reduce class ...

Read More »

Apache Hadoop Cluster Setup Example (with Virtual Machines)

Table Of Contents 1. Introduction 2. Requirements 3. Preparing Virtual Machine 3.1 Creating VM and Installing Guest OS 3.2 Installing Guest Additions 4. Creating Cluster of Virtual Machines 4.1 VM Network settings 4.2 Cloning the Virtual Machine 4.3 Testing the network IPs assigned to VMs 4.4 Converting to Static IPs for VMs 5. Hadoop prerequisite settings 5.1 Creating User 5.2 ...

Read More »

Apache Hadoop Distcp Example

In this example, we are going to show you how to copy large files in inter/intra-cluster setup of Hadoop using distributed copy tool. 1. Introduction DistCP is the shortform of Distributed Copy in context of Apache Hadoop. It is basically a tool which can be used in case we need to copy large amount of data/files in inter/intra-cluster setup. In ...

Read More »

Hadoop Hello World Example

1. Introduction In this post, we feature a comprehensive Hadoop Hello World Example. Hadoop is an Apache Software Foundation project. It is the open source version inspired by Google MapReduce and Google File System. It is designed for distributed processing of large data sets across a cluster of systems often running on commodity standard hardware. Hadoop is designed with an assumption ...

Read More »