Apache Hadoop

Apache Hadoop Getting Started Example

1. Introduction

This is an in-depth article related to the Apache Hadoop Example. Hadoop is an opensource project which has software modules like Pig Hive, HBase, Phoenix, Spark, ZooKeeper, Cloudera, Flume, Sqoop, Oozie, and Storm. Map Reduce is part of Hadoop which is used for big data processing.

2. Apache Hadoop Getting Started

Hadoop is an opensource framework for distributed big data processing. Hadoop can be scaled to execute on multiple nodes going beyond 1000 nodes. Hadoop based big data architecture is highly scalable and available.

2.1 Prerequisites

Java 7 or 8 is required on the linux, windows or mac operating system. Maven 3.6.1 is required for building the hadoop based application. Apache Hadoop 2.6 can be downloaded from Hadoop Website

2.2 Download

You can download Java 8 can be downloaded from the Oracle web site .  Apache Hadoop 2.6 can be downloaded from Hadoop Website

2.3 Setup

You can set the environment variables for JAVA_HOME and PATH. They can be set as shown below:


export JAVA_HOME
export PATH

The environment variables for maven are set as below:

Maven Environment

export M2_HOME=/users/bhagvan.kommadi/Desktop/apache-maven-3.6.1
export M2=$M2_HOME/bin
export PATH=$M2:$PATH

2.4 Hadoop Getting Started

After extracting the hadoop zip archive, you can start configuring the hadoop.

You need to configure HADOOP_HOME as below:

Hadoop Home

export HADOOP_HOME=/users/bhagvan.kommadi/desktop/hadoop-2.6/

You need to configure $HADOOP_HOME/etc/hadoop/core-site.xml as below:

Core Site – Hadoop Configuration

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--  Licensed under the Apache License, Version 2.0 (the "License");  you may not use this file except in compliance with the License.  You may obtain a copy of the License at     http://www.apache.org/licenses/LICENSE-2.0   Unless required by applicable law or agreed to in writing, software  distributed under the License is distributed on an "AS IS" BASIS,  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and  limitations under the License. See accompanying LICENSE file.--> <!-- Put site-specific property overrides in this file. --> 

You need to start running Hadoop by using the command below :

Hadoop Execution

cd hadoop-2.6/cd sbin./start-dfs.sh

The output of the commands is shown below :

Hadoop Execution

apples-MacBook-Air:sbin bhagvan.kommadi$ ./start-dfs.sh
20/06/29 20:26:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
Starting namenodes on [apples-MacBook-Air.local]apples-MacBook-Air.local: 
Warning: Permanently added the ECDSA host key for IP address 'fe80::4e9:963f:5cc3:a000%en0' to the list of known hosts.Password:apples-MacBook-Air.local: 
starting namenode, logging to /Users/bhagvan.kommadi/desktop/hadoop-2.9.1/logs/hadoop-bhagvan.kommadi-namenode-apples-MacBook-Air.local.outPassword:localhost: 
starting datanode, logging to /Users/bhagvan.kommadi/desktop/hadoop-2.9.1/logs/hadoop-bhagvan.kommadi-datanode-apples-MacBook-Air.local.outStarting secondary namenodes []Password: 
starting secondarynamenode, logging to /Users/bhagvan.kommadi/desktop/hadoop-2.9.1/logs/hadoop-bhagvan.kommadi-secondarynamenode-apples-MacBook-Air.local.out20/06/29 20:27:07 
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

The above procedure is for single node hadoop setup. To setup multiple nodes, Big data is required. Multiple Nodes can handle data blocks to handle fault tolerance. For storing data, HDFS is used and YARN is used for parallelprocessing.

3. Download the Source Code

You can download the full source code of this example here: Apache Hadoop Apache Hadoop Getting Started

Bhagvan Kommadi

Bhagvan Kommadi is the Founder of Architect Corner & has around 20 years’ experience in the industry, ranging from large scale enterprise development to helping incubate software product start-ups. He has done Masters in Industrial Systems Engineering at Georgia Institute of Technology (1997) and Bachelors in Aerospace Engineering from Indian Institute of Technology, Madras (1993). He is member of IFX forum,Oracle JCP and participant in Java Community Process. He founded Quantica Computacao, the first quantum computing startup in India. Markets and Markets have positioned Quantica Computacao in ‘Emerging Companies’ section of Quantum Computing quadrants. Bhagvan has engineered and developed simulators and tools in the area of quantum technology using IBM Q, Microsoft Q# and Google QScript. He has reviewed the Manning book titled : "Machine Learning with TensorFlow”. He is also the author of Packt Publishing book - "Hands-On Data Structures and Algorithms with Go".He is member of IFX forum,Oracle JCP and participant in Java Community Process. He is member of the MIT Technology Review Global Panel.
Notify of

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Newest Most Voted
Inline Feedbacks
View all comments
Back to top button