Home » Enterprise Java » Apache Hadoop » Page 3

Apache Hadoop

Hadoop Mapper Example

In this example, we will discuss and understand Hadoop Mappers, which is the first half of the Hadoop MapReduce Framework. Mappers are the most evident part of any MapReduce application and a good understanding of Mappers is required for taking full advantage of the MapReduce capabilities. 1. Introduction Mapper is the base class which is used to implement the Map ...

Read More »

Hadoop CopyFromLocal Example

In this example, we will understand the CopyFromLocal API of Hadoop MapReduce and various ways it can be used in the applications and maintenance of the clusters. We assume the previous knowledge of what Hadoop is and what Hadoop can do? How it works in distributed fashion and what Hadoop Distributed File System(HDFS) is? So that we can go ahead ...

Read More »

Hadoop Streaming Example

In this example, we will dive into the streaming component of Hadoop MapReduce. We will understand the basics of Hadoop Streaming and see an example using Python. Table Of Contents 1. Introduction 2. Prerequisites and Assumptions 3. Hadoop Streaming Workflow 4. MapReduce Code in Python 4.1. Wordcount Example 4.2. Mapper 4.3. Reducer 5. Testing the Python code 6. Submitting and ...

Read More »

Hadoop Oozie Example

In this example, we will learn about Oozie which is a Hadoop Ecosystem Framework to help automate the process of work scheduling on Hadoop clusters. 1. Introduction Apache Oozie is an open-source project which is the part of the Hadoop Ecosystem. It is used to create the workflow and automate the process of different job and task scheduling depending on ...

Read More »

Apache Hadoop Distributed Cache Example

In this example article, we will go through Apache Hadoop Distributed Cache and will understand how to use it with MapReduce Jobs. 1. Introduction Distributed Cache as the name indicates is the caching system to store files or data which is required frequently and this mechanism is distributed in nature as all other components of Hadoop are. It can cache ...

Read More »

Apache Hadoop Hive Tutorial

In this example, we will understand what Apache Hive is, where it is used, basics of Apache Hive, its data types and basic operations. 1. Introduction Apache Hive is data infrastructure tool which works on top of Hadoop to handle big data. It provides a SQL-like query system to system to interact with the data stored in the Hadoop Distributed ...

Read More »

Apache Hadoop Wordcount Example

In this example, we will demonstrate the Word Count example in Hadoop. Word count is the basic example to understand the Hadoop MapReduce paradigm in which we count the number of instances of each word in an input file and gives the list of words and the number of instances of the particular word as an output. 1. Introduction Hadoop ...

Read More »

Apache Hadoop Distributed File System Explained

In this example, we will discuss Apache Hadoop Distributed File System(HDFS), its components and the architecture in detail. HDFS is one of the core components of Apache Hadoop ecosystem also.                     Table Of Contents 1. Introduction 2. HDFS Design 2.1 System failures 2.2 Can handle large amount of data 2.3 Coherency ...

Read More »

How to Install Apache Hadoop on Ubuntu

In this example, we will see the details of how to install Apache Hadoop on an Ubuntu system. We will go through all the required steps starting with the required pre-requisites of Apache Hadoop followed by how to configure Hadoop and we will finish this example by learning how to insert data into Hadoop and how to run an example ...

Read More »

Apache Hadoop FS Commands Example

In this example, we will go through most important commands which you may need to know to handle Hadoop File System(FS). We assume the previous knowledge of what Hadoop is and what Hadoop can do? How it works in distributed fashion and what Hadoop Distributed File System(HDFS) is? So that we can go ahead and check some examples of how ...

Read More »