Home » Enterprise Java » Apache Hadoop

Apache Hadoop

Apache Hadoop ETL Tutorial

1. Introduction This is an in-depth article related to the Apache Hadoop ETL Tool – Hive. Hive is part of the Hadoop Ecosystem. It is used in Big Data solutions with Hadoop. It was developed by Facebook. Hadoop is an Apache Opensource project now. Hive is used as ETL (Extraction-Transformation-Load) tool in the Hadoop system for the execution of queries ...

Read More »

Apache Hadoop Getting Started Example

1. Introduction This is an in-depth article related to the Apache Hadoop Example. Hadoop is an opensource project which has software modules like Pig Hive, HBase, Phoenix, Spark, ZooKeeper, Cloudera, Flume, Sqoop, Oozie, and Storm. Map Reduce is part of Hadoop which is used for big data processing. 2. Apache Hadoop Getting Started Hadoop is an opensource framework for distributed ...

Read More »

Apache Hadoop Development Tools Eclipse Tutorial

1. Introduction This is an in-depth article related to the Apache Hadoop Development Tools Eclipse. Eclipse is used for developing java applications. Apache Hadoop is used for analyzing and storing big data. Developers use eclipse versions like Indigo, Juno, Kepler, Oxygen, and Photon. Hadoop Eclipse tools work well with eclipse above or equal to version 3.6. You can manage multiple ...

Read More »

Big Data Pipeline Tutorial

In this post, we feature a comprehensive tutorial on Big Data Pipeline. 1. Big Data Pipeline – Background Hadoop is an open source data analytics platform that addresses the reliable storage and processing of big data. Hadoop is suitable for handling unstructured data, including the basic components of HDFS and MapReduce. What is HDFS? HDFS provides a flexible data storage ...

Read More »

Apache Hadoop Nutch Tutorial

In this tutorial, we will go through and introduce another component of the Apache Hadoop ecosystem that is Apache Nutch. Apache Nutch is a Web crawler which takes advantage of the distributed Hadoop ecosystem for crawling data.                   1. Introduction Apache Nutch is a production ready web crawler which relies on Apache ...

Read More »

Apache Hadoop Knox Tutorial

In this tutorial, we will learn about Apache Knox. Knox provides the REST API Gateway for the Apache Hadoop Ecosystem. We will go through the basics of Apache Knox in the following sections.                     1. Introduction Apache Knox is the open source project under Apache Software Foundation similar to most other ...

Read More »

Hadoop Hbase Maven Example

In this article, we will learn about using Maven for including Hbase in your Apache Hadoop related applications and how Maven makes it easy to write the Java Hbase applications with the repositories.                     1. Introduction Hbase is the NoSql database available in the Hadoop Ecosystem. Like rest of the Hadoop ...

Read More »

Hadoop Kerberos Authentication Tutorial

In this tutorial we will see how to secure the Hadoop Cluster and implement authentication in the cluster. Kerberos is an authentication implementation which is a standard used to implement security in the Hadoop cluster.                   1. Introduction Kerberos is the standard and most widely used way of implementing the user authentication ...

Read More »

Hadoop High Availability Tutorial

In this tutorial, we will have a look at the High Availability feature of the Apache Hadoop Cluster. High Availability is one of the most important feature which is needed especially when the cluster is in production state. We do not want any single failure to make the whole cluster unavailable, so this is when High Availability of Hadoop comes ...

Read More »

Hadoop Getmerge Example

In this example, we will look at merging the different files into one file in HDFS (Hadoop Distributed File System) and Apache Hadoop. Specifically the getmerge command. 1. Introduction Merging is one of the tasks which is required a lot of times in Hadoop and most of the times, the number of files is large or the size of files ...

Read More »