Home » Apache Hadoop

Tag Archives: Apache Hadoop

Apache Hadoop Mahout Tutorial

1. Introduction This is an in-depth article related to the Apache Hadoop Mahout. It is used in Machine Learning solutions with Hadoop. It was developed by Facebook. Hadoop and Mahout are Apache Opensource projects now. Apache Mahout was part of the Lucene project in 2008. It became an independent project in 2010. 2. Apache Hadoop Mahout 2.1 Prerequisites Java 7 ...

Read More »

Apache Hadoop ETL Tutorial

1. Introduction This is an in-depth article related to the Apache Hadoop ETL Tool – Hive. Hive is part of the Hadoop Ecosystem. It is used in Big Data solutions with Hadoop. It was developed by Facebook. Hadoop is an Apache Opensource project now. Hive is used as ETL (Extraction-Transformation-Load) tool in the Hadoop system for the execution of queries ...

Read More »

Hadoop High Availability Tutorial

In this tutorial, we will have a look at the High Availability feature of the Apache Hadoop Cluster. High Availability is one of the most important feature which is needed especially when the cluster is in production state. We do not want any single failure to make the whole cluster unavailable, so this is when High Availability of Hadoop comes ...

Read More »

Is Hadoop a database?

In this article we will try to address the one of the most asked question by beginners in the Apache Hadoop and Big Data ecosystem. That is Is Hadoop a Database? or more specifically Is Hadoop Relational Database?               1. Is Hadoop a database No Hadoop is not a database, to understand the difference ...

Read More »

Difference Between Bigdata and Hadoop

In this article, we will understand the very basic question which the beginners in the field of Big Data have. That is What is the difference between Big Data and Apache Hadoop.                  1. Introduction The difference between Big Data and Apache Hadoop is distinct and quite fundamental. But most of the people ...

Read More »

Apache Hadoop RecordReader Example

In this example,we will have a look at and understand the concept of RecordReader component of Apache Hadoop. But before digging into the example code, we would like look at the theory behind the InputStream and RecordReader to better understand the concept.                   1. Introduction To better understand RecordReader, we have to ...

Read More »

The Best Hadoop Analytics Solutions

Data Analytics using Hadoop is one of the most important requirement in businesses today due to the amount of data being generated and the value the businesses can generate from this data. We will look into some of the best Hadoop Analytics Solutions available in the market which can be used for data analysis.             ...

Read More »

How Does Hadoop Work

Apache Hadoop is an open source software used for distributed computing that can process large amount of data and get the results faster using reliable and scalable architecture. Apache Hadoop runs on top of a commodity hardware cluster consisting of multiple systems which can range from couple of systems to thousands of systems. This cluster and involvement of multiple systems ...

Read More »

The Hadoop Ecosystem Explained

In this article, we will go through the Hadoop Ecosystem and will see of what it consists and what does the different projects are able to do. 1. Introduction Apache Hadoop is an open source platform managed by Apache Foundation. It is written in Java and is able to process large amount of data (generally called Big Data) in distributed ...

Read More »

Big Data Hadoop Tutorial for Beginners

This tutorial is for the beginners who want to start learning about Big Data and Apache Hadoop Ecosystem. This tutorial gives the introduction of different concepts of Big Data and Apache Hadoop which will set the base foundation for further learning. Table Of Contents 1. Introduction 2. Big Data? 2.1 Examples of Big Data. 3. Characteristics of Big Data 3.1 ...

Read More »