Home » HDFS

Tag Archives: HDFS

Hadoop Getmerge Example

In this example, we will look at merging the different files into one file in HDFS (Hadoop Distributed File System) and Apache Hadoop. Specifically the getmerge command. 1. Introduction Merging is one of the tasks which is required a lot of times in Hadoop and most of the times, the number of files is large or the size of files ...

Read More »

Is Hadoop a database?

In this article we will try to address the one of the most asked question by beginners in the Apache Hadoop and Big Data ecosystem. That is Is Hadoop a Database? or more specifically Is Hadoop Relational Database?               1. Is Hadoop a database No Hadoop is not a database, to understand the difference ...

Read More »

Difference Between Bigdata and Hadoop

In this article, we will understand the very basic question which the beginners in the field of Big Data have. That is What is the difference between Big Data and Apache Hadoop.                  1. Introduction The difference between Big Data and Apache Hadoop is distinct and quite fundamental. But most of the people ...

Read More »

How Does Hadoop Work

Apache Hadoop is an open source software used for distributed computing that can process large amount of data and get the results faster using reliable and scalable architecture. Apache Hadoop runs on top of a commodity hardware cluster consisting of multiple systems which can range from couple of systems to thousands of systems. This cluster and involvement of multiple systems ...

Read More »

The Hadoop Ecosystem Explained

In this article, we will go through the Hadoop Ecosystem and will see of what it consists and what does the different projects are able to do. 1. Introduction Apache Hadoop is an open source platform managed by Apache Foundation. It is written in Java and is able to process large amount of data (generally called Big Data) in distributed ...

Read More »

Apache Hadoop Hue Tutorial

In this tutorial, we will learn about Hue. This will be the basic tutorial to start understanding what Hue is and how it can be used in the Hadoop and Big Data Ecosystem. 1. Introduction First of all, let us look into what is Hue? Hue is an open source Web interface for analyzing data with any Apache Hadoop based ...

Read More »

Hadoop CopyFromLocal Example

In this example, we will understand the CopyFromLocal API of Hadoop MapReduce and various ways it can be used in the applications and maintenance of the clusters. We assume the previous knowledge of what Hadoop is and what Hadoop can do? How it works in distributed fashion and what Hadoop Distributed File System(HDFS) is? So that we can go ahead ...

Read More »

Apache Hadoop Distributed File System Explained

In this example, we will discuss Apache Hadoop Distributed File System(HDFS), its components and the architecture in detail. HDFS is one of the core components of Apache Hadoop ecosystem also.                     Table Of Contents 1. Introduction 2. HDFS Design 2.1 System failures 2.2 Can handle large amount of data 2.3 Coherency ...

Read More »