Apache Hadoop

Apache Hadoop Mahout Tutorial

1. Introduction

This is an in-depth article related to the Apache Hadoop Mahout. It is used in Machine Learning solutions with Hadoop. It was developed by Facebook. Hadoop and Mahout are Apache Opensource projects now. Apache Mahout was part of the Lucene project in 2008. It became an independent project in 2010.

2. Apache Hadoop Mahout

2.1 Prerequisites

Java 7 or 8 is required on the Linux, windows, or mac operating system. Maven 3.6.1 is required. Apache Hadoop 2.9.1 and Mahout 0.9 are used in this example.

2.2 Download

You can download Java 8 can be downloaded from the Oracle web site . Apache Maven 3.6.1 can be downloaded from Apache site. Apache Hadoop 2.9.1 can be downloaded from Hadoop Website. You can download Apache Mahout 0.9 from the Apache Mahout website.

2.3 Setup

You can set the environment variables for JAVA_HOME and PATH. They can be set as shown below:

Setup

JAVA_HOME="/desktop/jdk1.8.0_73"
export JAVA_HOME
PATH=$JAVA_HOME/bin:$PATH
export PATH

The environment variables for maven are set as below:

Maven Environment

JAVA_HOME=”/jboss/jdk1.8.0_73″
export M2_HOME=/users/bhagvan.kommadi/Desktop/apache-maven-3.6.1
export M2=$M2_HOME/bin
export PATH=$M2:$PATH

2.4 How to download and install Hadoop and Mahout

After downloading the zip files of Hadoop and Mahout they can be extracted to different folders. The libraries in the libs folder are set in the CLASSPATH variable.

2.5 Apache Mahout

Mahout comes from the word “elephant”. Apache Mahout is used for developing solutions involving ML algorithms like Recommendation, Classification, and Clustering. Mahout has features such as data mining framework, data analysis, clustering implementation, classification implementation, evolutionary programming techniques, and matrix and vector libraries. Social media companies like facebook, yahoo, linkedin, and foursquare use mahout. Mahout ecommerce framework has recomender engine which helps in identifying places on Foursquare. Twitter has chosen Mahout for user inteterest modelling and yahoo uses it for pattern matching.

2.6 Apache Hadoop Configuration

You need to configure HADOOP_HOME as below:

Setup

export HADOOP_HOME=/users/bhagvan.kommadi/desktop/hadoop-2.9.1/

You need to configure $HADOOP_HOME/etc/hadoop/core-site.xml as below:

Core Site XML file

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>
 <name>fs.defaultFS</name>
 <value>hdfs://apples-MacBook-Air.local:8020</value>
</property>

</configuration>

You need to start running Hadoop by using the command below:

Hadoop Execution

cd hadoop-2.9.1/
cd sbin
./start-dfs.sh

The output of the commands is shown below:

Hadoop Execution

apples-MacBook-Air:sbin bhagvan.kommadi$ ./start-dfs.sh
20/09/14 20:26:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
Starting namenodes on [apples-MacBook-Air.local]
apples-MacBook-Air.local: Warning: Permanently added the ECDSA host key for IP address 'fe80::4e9:963f:5cc3:a000%en0' to the list of known hosts.
Password:
apples-MacBook-Air.local: starting namenode, logging to /Users/bhagvan.kommadi/desktop/hadoop-2.9.1/logs/hadoop-bhagvan.kommadi-namenode-apples-MacBook-Air.local.out
Password:
localhost: starting datanode, logging to /Users/bhagvan.kommadi/desktop/hadoop-2.9.1/logs/hadoop-bhagvan.kommadi-datanode-apples-MacBook-Air.local.out
Starting secondary namenodes [0.0.0.0]
Password:
0.0.0.0: starting secondarynamenode, logging to /Users/bhagvan.kommadi/desktop/hadoop-2.9.1/logs/hadoop-bhagvan.kommadi-secondarynamenode-apples-MacBook-Air.local.out
20/09/14 20:27:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

2.7 Apache Hadoop Mahout

Let us start with basic usecase for Mahout – Recommendation Engine. First step is to create a data model. Below is the data model.

Hadoop Execution

1,11,2.0
1,12,5.0
1,13,5.0
1,14,5.0
1,15,4.0
1,16,5.0
1,17,1.0
1,18,5.0
2,10,1.0
2,11,2.0
2,15,5.0
2,16,4.5
2,17,1.0
2,18,5.0
3,11,2.5
3,12,4.5
3,13,4.0
3,14,3.0
3,15,3.5
3,16,4.5
3,17,4.0
3,18,5.0
4,10,5.0
4,11,5.0
4,12,5.0
4,13,0.0
4,14,2.0
4,15,3.0
4,16,1.0
4,17,4.0
4,18,1.0

The PearsonCorrelationSimilarity class is used to create usersimilarity. It takes the userdata model in the constructor. The data model file has the User, Item, and Preference columns related to the product. The data model file is passed as file in the FileDataModel constructor.

Data Model

DataModel usermodel = new FileDataModel(new File("userdata.txt")); 

The next step is to create UserSimilarity by using PearsonCorrelationSimilarity as shown below in the RecommenderBuilder class.

User Similarity

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
			public Recommender buildRecommender(DataModel model) throws TasteException {
				UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
    }
}

ThresholdUserNeighborhood is used for the UserNeighborhood class. ThresholdUserNeighborhood is a neighborhood for all the users whose similarity to the given user meets or exceeds a certain threshold. In the below code, threshold is set as 2.0.

User Similarity

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
			public Recommender buildRecommender(DataModel model) throws TasteException {
				UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
                                UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);
    }
}

The next step is to create GenericUserbasedRecomender as shown below:

User Recommender

RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
			public Recommender buildRecommender(DataModel model) throws TasteException {
				UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
                                UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);
            return new GenericUserBasedRecommender(model, neighborhood, similarity);
    }
}

The next step is to invoke the recommend() method of Recommender interface. Method parameters are the user id and the number of recommendations. Below code shows the implementation:

User Recommender

Recommender recommender = recommenderBuilder.buildRecommender(usermodel);
		List recommendations = recommender.recommend(3, 1);	
         System.out.println("Recommendations "+recommendations);		
         for (RecommendedItem recommendationItem : recommendations) {
            System.out.println(recommendationItem);
         }
      
      }catch(Exception exception){

            exception.printStackTrace();

       }

Below is the complete class which shows the implementation of RecommendationEngine.

Recommendation Engine

import java.io.File;
import java.util.List;

import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.ThresholdUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;

import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;

import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.UserBasedRecommender;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
public class RecommendationEngine {
   public static void main(String args[]){
      try{

         DataModel usermodel = new FileDataModel(new File("userdata.txt")); 

         System.out.println(usermodel);     
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
			public Recommender buildRecommender(DataModel model) throws TasteException {
				UserSimilarity similarity = new PearsonCorrelationSimilarity(model);

				UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);
				return new GenericUserBasedRecommender(model, neighborhood, similarity);
			}
		}; 
         
        Recommender recommender = recommenderBuilder.buildRecommender(usermodel);
		List recommendations = recommender.recommend(3, 1);	
         System.out.println("Recommendations "+recommendations);		
         for (RecommendedItem recommendationItem : recommendations) {
            System.out.println(recommendationItem);
         }
      
      }catch(Exception exception){

            exception.printStackTrace();

       }

     }
     
} 

To compile the code above, you can use the command below:

Recommendation Engine

 javac -cp "/Users/bhagvan.kommadi/desktop/mahout-distribution-0.9/*" RecommendationEngine.java 

To execute the code, the command below is used.

Recommendation Engine

java -cp "/Users/bhagvan.kommadi/desktop/mahout-distribution-0.9/*:.:/Users/bhagvan.kommadi/desktop/mahout-distribution-0.9/lib/*" RecommendationEngine

The output when the above command is executed is shown below.

Recommendation Engine

apples-MacBook-Air:apache_mahout bhagvan.kommadi$ java -cp "/Users/bhagvan.kommadi/desktop/mahout-distribution-0.9/*:.:/Users/bhagvan.kommadi/desktop/mahout-distribution-0.9/lib/*" RecommendationEngine
log4j:WARN No appenders could be found for logger (org.apache.mahout.cf.taste.impl.model.file.FileDataModel).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
FileDataModel[dataFile:/Users/bhagvan.kommadi/Desktop/JavacodeGeeks/Code/apache_mahout/userdata.txt]
Recommendations [RecommendedItem[item:10, value:1.0]]
RecommendedItem[item:10, value:1.0]

3. Download the Source Code

Download
You can download the full source code of this example here: Apache Hadoop Mahout Tutorial

Bhagvan Kommadi

Bhagvan Kommadi is the Founder of Architect Corner & has around 20 years’ experience in the industry, ranging from large scale enterprise development to helping incubate software product start-ups. He has done Masters in Industrial Systems Engineering at Georgia Institute of Technology (1997) and Bachelors in Aerospace Engineering from Indian Institute of Technology, Madras (1993). He is member of IFX forum,Oracle JCP and participant in Java Community Process. He founded Quantica Computacao, the first quantum computing startup in India. Markets and Markets have positioned Quantica Computacao in ‘Emerging Companies’ section of Quantum Computing quadrants. Bhagvan has engineered and developed simulators and tools in the area of quantum technology using IBM Q, Microsoft Q# and Google QScript. He has reviewed the Manning book titled : "Machine Learning with TensorFlow”. He is also the author of Packt Publishing book - "Hands-On Data Structures and Algorithms with Go".He is member of IFX forum,Oracle JCP and participant in Java Community Process. He is member of the MIT Technology Review Global Panel.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button