Home » Enterprise Java » Apache Hadoop » Page 3

Apache Hadoop

Hadoop Mapreduce Combiner Example

In this example, we will learn about Hadoop Combiners. Combiners are highly useful functions offered by Hadoop especially when we are processing large amount of data. We will understand the combiners using a simple question. 1. Introduction Hadoop Combiner class is an optional class in the MapReduce framework which is added in between the Map class and the Reduce class ...

Read More »

Apache Hadoop as a Service Options

In this article, we will have a look at the available option for making use of Hadoop as a service aka HDaaS. Implementing Hadoop Cluster on own/in-house infrastructure is a complex task in itself and need a dedicated and expert team. To solve this complexity, there are many vendors providing cloud implementations of Hadoop clusters and we will have a ...

Read More »

Apache Hadoop Hue Tutorial

In this tutorial, we will learn about Hue. This will be the basic tutorial to start understanding what Hue is and how it can be used in the Hadoop and Big Data Ecosystem. 1. Introduction First of all, let us look into what is Hue? Hue is an open source Web interface for analyzing data with any Apache Hadoop based ...

Read More »

Apache Hadoop Administration Tutorial

In this tutorial, we will look into the administration responsibilities and how to administer the Hadoop Cluster. 1. Introduction Apache Hadoop Administration includes Hadoop Distributed File System(HDFS) administration as well as MapReduce administration. We will look into both the aspects. MapReduce administration means the admin need to monitor the running applications and tasks, application status, node configurations for running MapReduce ...

Read More »

Hadoop Mapper Example

In this example, we will discuss and understand Hadoop Mappers, which is the first half of the Hadoop MapReduce Framework. Mappers are the most evident part of any MapReduce application and a good understanding of Mappers is required for taking full advantage of the MapReduce capabilities. 1. Introduction Mapper is the base class which is used to implement the Map ...

Read More »

Hadoop CopyFromLocal Example

In this example, we will understand the CopyFromLocal API of Hadoop MapReduce and various ways it can be used in the applications and maintenance of the clusters. We assume the previous knowledge of what Hadoop is and what Hadoop can do? How it works in distributed fashion and what Hadoop Distributed File System(HDFS) is? So that we can go ahead ...

Read More »

Hadoop Streaming Example

In this example, we will dive into the streaming component of Hadoop MapReduce. We will understand the basics of Hadoop Streaming and see an example using Python. Table Of Contents 1. Introduction 2. Prerequisites and Assumptions 3. Hadoop Streaming Workflow 4. MapReduce Code in Python 4.1. Wordcount Example 4.2. Mapper 4.3. Reducer 5. Testing the Python code 6. Submitting and ...

Read More »

Hadoop Oozie Example

In this example, we will learn about Oozie which is a Hadoop Ecosystem Framework to help automate the process of work scheduling on Hadoop clusters. 1. Introduction Apache Oozie is an open-source project which is the part of the Hadoop Ecosystem. It is used to create the workflow and automate the process of different job and task scheduling depending on ...

Read More »

Apache Hadoop Distributed Cache Example

In this example article, we will go through Apache Hadoop Distributed Cache and will understand how to use it with MapReduce Jobs. 1. Introduction Distributed Cache as the name indicates is the caching system to store files or data which is required frequently and this mechanism is distributed in nature as all other components of Hadoop are. It can cache ...

Read More »

Apache Hadoop Hive Tutorial

In this example, we will understand what Apache Hive is, where it is used, basics of Apache Hive, its data types and basic operations. 1. Introduction Apache Hive is data infrastructure tool which works on top of Hadoop to handle big data. It provides a SQL-like query system to system to interact with the data stored in the Hadoop Distributed ...

Read More »