Home » Enterprise Java » Apache Hadoop

Apache Hadoop

Apache Hadoop Nutch Tutorial

In this tutorial, we will go through and introduce another component of the Apache Hadoop ecosystem that is Apache Nutch. Apache Nutch is a Web crawler which takes advantage of the distributed Hadoop ecosystem for crawling data. 1. Introduction Apache Nutch is a production ready web crawler which relies on Apache Hadoop data structures and makes use of the distributed ...

Read More »

Apache Hadoop Knox Tutorial

In this tutorial, we will learn about Apache Knox. Knox provides the REST API Gateway for the Apache Hadoop Ecosystem. We will go through the basics of Apache Knox in the following sections. 1. Introduction Apache Knox is the open source project under Apache Software Foundation similar to most other Hadoop Ecosystem projects. It provides a REST API Gateway for ...

Read More »

Hadoop Hbase Maven Example

In this article, we will learn about using Maven for including Hbase in your Apache Hadoop related applications and how Maven makes it easy to write the Java Hbase applications with the repositories. 1. Introduction Hbase is the NoSql database available in the Hadoop Ecosystem. Like rest of the Hadoop Ecosystem Hbase is also open-source and is used when the ...

Read More »

Hadoop Kerberos Authentication Tutorial

In this tutorial we will see how to secure the Hadoop Cluster and implement authentication in the cluster. Kerberos is an authentication implementation which is a standard used to implement security in the Hadoop cluster. 1. Introduction Kerberos is the standard and most widely used way of implementing the user authentication in the Hadoop cluster. It is the network authentication ...

Read More »

Hadoop High Availability Tutorial

In this tutorial, we will have a look at the High Availability feature of the Apache Hadoop Cluster. High Availability is one of the most important feature which is needed especially when the cluster is in production state. We do not want any single failure to make the whole cluster unavailable, so this is when High Availability of Hadoop comes ...

Read More »

Hadoop Getmerge Example

In this example, we will look at merging the different files into one file in HDFS (Hadoop Distributed File System) and Apache Hadoop. Specifically the getmerge command. 1. Introduction Merging is one of the tasks which is required a lot of times in Hadoop and most of the times, the number of files is large or the size of files ...

Read More »

Is Hadoop a database?

In this article we will try to address the one of the most asked question by beginners in the Apache Hadoop and Big Data ecosystem. That is Is Hadoop a Database? or more specifically Is Hadoop Relational Database? 1. Is Hadoop a database No Hadoop is not a database, to understand the difference we need to understand what exactly a ...

Read More »

Difference Between Bigdata and Hadoop

In this article, we will understand the very basic question which the beginners in the field of Big Data have. That is What is the difference between Big Data and Apache Hadoop. 1. Introduction The difference between Big Data and Apache Hadoop is distinct and quite fundamental. But most of the people especially the beginners are sometimes confused between the ...

Read More »

Apache Hadoop RecordReader Example

In this example,we will have a look at and understand the concept of RecordReader component of Apache Hadoop. But before digging into the example code, we would like look at the theory behind the InputStream and RecordReader to better understand the concept. 1. Introduction To better understand RecordReader, we have to understand the InputFormat first. InputFormat defines how the data ...

Read More »

Hadoop Sequence File Example

In the article we will have a look at Hadoop Sequence file format. Hadoop Sequence Files are one of the Apache Hadoop specific file formats which stores data in serialized key-value pair. We have look into details of Hadoop Sequence File in the subsequent sections. 1. Introduction Apache Hadoop supports text files which are quite commonly used for storing the ...

Read More »

Want to take your Java skills to the next level?

Grab our programming books for FREE!

Here are some of the eBooks you will get:

  • Spring Interview QnA
  • Multithreading & Concurrency QnA
  • JPA Minibook
  • JVM Troubleshooting Guide
  • Advanced Java
  • Java Interview QnA
  • Java Design Patterns