This is an in-depth article related to the Apache Hadoop Example. Hadoop is an opensource project which has software modules like Pig Hive, HBase, Phoenix, Spark, ZooKeeper, Cloudera, Flume, Sqoop, Oozie, and Storm. Map Reduce is part of Hadoop which is used for big data processing.
Hadoop is an opensource framework for distributed big data processing. Hadoop can be scaled to execute on multiple nodes going beyond 1000 nodes. Hadoop based big data architecture is highly scalable and available.
Java 7 or 8 is required on the linux, windows or mac operating system. Maven 3.6.1 is required for building the hadoop based application. Apache Hadoop 2.6 can be downloaded from Hadoop Website.
You can set the environment variables for JAVA_HOME and PATH. They can be set as shown below: Setup
JAVA_HOME="/desktop/jdk1.8.0_73" export JAVA_HOME PATH=$JAVA_HOME/bin:$PATH export PATH
The environment variables for maven are set as below: Maven Environment
JAVA_HOME=”/jboss/jdk1.8.0_73″ export M2_HOME=/users/bhagvan.kommadi/Desktop/apache-maven-3.6.1 export M2=$M2_HOME/bin export PATH=$M2:$PATH
After extracting the hadoop zip archive, you can start configuring the hadoop.
You need to configure
HADOOP_HOME as below:
You need to configure
$HADOOP_HOME/etc/hadoop/core-site.xml as below:
Core Site – Hadoop Configuration
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file.--> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://apples-MacBook-Air.local:8020</value> </property> </configuration>
You need to start running Hadoop by using the command below : Hadoop Execution
cd hadoop-2.6/cd sbin./start-dfs.sh
The output of the commands is shown below : Hadoop Execution
apples-MacBook-Air:sbin bhagvan.kommadi$ ./start-dfs.sh 20/06/29 20:26:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable Starting namenodes on [apples-MacBook-Air.local]apples-MacBook-Air.local: Warning: Permanently added the ECDSA host key for IP address 'fe80::4e9:963f:5cc3:a000%en0' to the list of known hosts.Password:apples-MacBook-Air.local: starting namenode, logging to /Users/bhagvan.kommadi/desktop/hadoop-2.9.1/logs/hadoop-bhagvan.kommadi-namenode-apples-MacBook-Air.local.outPassword:localhost: starting datanode, logging to /Users/bhagvan.kommadi/desktop/hadoop-2.9.1/logs/hadoop-bhagvan.kommadi-datanode-apples-MacBook-Air.local.outStarting secondary namenodes [0.0.0.0]Password:0.0.0.0: starting secondarynamenode, logging to /Users/bhagvan.kommadi/desktop/hadoop-2.9.1/logs/hadoop-bhagvan.kommadi-secondarynamenode-apples-MacBook-Air.local.out20/06/29 20:27:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
The above procedure is for single node hadoop setup. To setup multiple nodes, Big data is required. Multiple Nodes can handle data blocks to handle fault tolerance. For storing data, HDFS is used and YARN is used for parallelprocessing.
You can download the full source code of this example here: Apache Hadoop Apache Hadoop Getting Started