Apache Hadoop Getting Started Example

Bhagvan KommadiSeptember 10th, 2021Last Updated: September 7th, 2021

0 117 2 minutes read

1. Introduction

This is an in-depth article related to the Apache Hadoop Example. Hadoop is an opensource project which has software modules like Pig Hive, HBase, Phoenix, Spark, ZooKeeper, Cloudera, Flume, Sqoop, Oozie, and Storm. Map Reduce is part of Hadoop which is used for big data processing.

2. Apache Hadoop Getting Started

Hadoop is an opensource framework for distributed big data processing. Hadoop can be scaled to execute on multiple nodes going beyond 1000 nodes. Hadoop based big data architecture is highly scalable and available.

2.1 Prerequisites

Java 7 or 8 is required on the linux, windows or mac operating system. Maven 3.6.1 is required for building the hadoop based application. Apache Hadoop 2.6 can be downloaded from Hadoop Website.

2.2 Download

You can download Java 8 can be downloaded from the Oracle web site . Apache Hadoop 2.6 can be downloaded from Hadoop Website.

2.3 Setup

You can set the environment variables for JAVA_HOME and PATH. They can be set as shown below:

Setup

JAVA_HOME="/desktop/jdk1.8.0_73"
export JAVA_HOME
PATH=$JAVA_HOME/bin:$PATH
export PATH

The environment variables for maven are set as below:

Maven Environment

JAVA_HOME=”/jboss/jdk1.8.0_73″
export M2_HOME=/users/bhagvan.kommadi/Desktop/apache-maven-3.6.1
export M2=$M2_HOME/bin
export PATH=$M2:$PATH

2.4 Hadoop Getting Started

After extracting the hadoop zip archive, you can start configuring the hadoop.

You need to configure HADOOP_HOME as below:

Hadoop Home

export HADOOP_HOME=/users/bhagvan.kommadi/desktop/hadoop-2.6/

You need to configure $HADOOP_HOME/etc/hadoop/core-site.xml as below:

Core Site – Hadoop Configuration

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--  Licensed under the Apache License, Version 2.0 (the "License");  you may not use this file except in compliance with the License.  You may obtain a copy of the License at     http://www.apache.org/licenses/LICENSE-2.0   Unless required by applicable law or agreed to in writing, software  distributed under the License is distributed on an "AS IS" BASIS,  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and  limitations under the License. See accompanying LICENSE file.--> <!-- Put site-specific property overrides in this file. --> 
<configuration> 
<property> 
    <name>fs.defaultFS</name> 
<value>hdfs://apples-MacBook-Air.local:8020</value>
</property>
 </configuration>

You need to start running Hadoop by using the command below :

Hadoop Execution

cd hadoop-2.6/cd sbin./start-dfs.sh

The output of the commands is shown below :

Hadoop Execution

apples-MacBook-Air:sbin bhagvan.kommadi$ ./start-dfs.sh
20/06/29 20:26:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
Starting namenodes on [apples-MacBook-Air.local]apples-MacBook-Air.local: 
Warning: Permanently added the ECDSA host key for IP address 'fe80::4e9:963f:5cc3:a000%en0' to the list of known hosts.Password:apples-MacBook-Air.local: 
starting namenode, logging to /Users/bhagvan.kommadi/desktop/hadoop-2.9.1/logs/hadoop-bhagvan.kommadi-namenode-apples-MacBook-Air.local.outPassword:localhost: 
starting datanode, logging to /Users/bhagvan.kommadi/desktop/hadoop-2.9.1/logs/hadoop-bhagvan.kommadi-datanode-apples-MacBook-Air.local.outStarting secondary namenodes [0.0.0.0]Password:0.0.0.0: 
starting secondarynamenode, logging to /Users/bhagvan.kommadi/desktop/hadoop-2.9.1/logs/hadoop-bhagvan.kommadi-secondarynamenode-apples-MacBook-Air.local.out20/06/29 20:27:07 
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

The above procedure is for single node hadoop setup. To setup multiple nodes, Big data is required. Multiple Nodes can handle data blocks to handle fault tolerance. For storing data, HDFS is used and YARN is used for parallelprocessing.

3. Download the Source Code

Download
You can download the full source code of this example here: Apache Hadoop Apache Hadoop Getting Started

Apache Hadoop Getting Started Example

1. Introduction

2. Apache Hadoop Getting Started

2.1 Prerequisites

2.2 Download

2.3 Setup

2.4 Hadoop Getting Started

3. Download the Source Code

Thank you!

Bhagvan Kommadi

Thank you!

1. Introduction

2. Apache Hadoop Getting Started

2.1 Prerequisites

2.2 Download

2.3 Setup

2.4 Hadoop Getting Started

3. Download the Source Code

Thank you!

Related Articles

Thank you!