Apache Hadoop

Apache Hadoop FS Commands Example

In this example, we will go through most important commands which you may need to know to handle Hadoop File System(FS).

We assume the previous knowledge of what Hadoop is and what Hadoop can do? How it works in distributed fashion and what Hadoop Distributed File System(HDFS) is? So that we can go ahead and check some examples of how to deal with the Hadoop File System and what are some of the most important commands. Following are two examples which can help you if you are not well aware about Apache Hadoop:

Let us get started, as said in this example we will see top and the most frequently used Hadoop File System(fs) commands which will be useful to manage files and data in HDFS clusters.

 

1. Introduction

The Hadoop File System(FS) provides various shell like commands by default which can be used to interact with the Hadoop Distributed File System(HDFS) or any other supported file system using the Hadoop Shell. Some of the most common commands are the once used for operations like creating directories, copying a file, viewing the file content, changing ownership or permissions on the file.

2. Common Commands

In this section, we will see the usage and the example of most common Hadoop FS Commands.

2.1. Create a directory

Usage:

hadoop fs -mkdir <paths>

Example:

hadoop fs -mkdir /user/root/dir1

Command in the second line is for listing the content of a particular path. We will see this command in the next sub-section. We can see in the screenshot that dir1 is created

Create Directory in Hadoop FS
Create Directory in Hadoop FS

Creating multiple directories with single command

hadoop fs -mkdir /user/root/dir1 /user/root/dir2

As shown in the above example, to create multiple directories in one go just pass multiple path and directory names separated by space.

Make multiple directories with single command
Make multiple directories with single command

2.2. List the content of the directory

Usage:

hadoop fs -ls <paths>

Example:

hadoop fs -ls /user/root/

The command is similar to the ls command of the unix shell.

Listing the files and directories
Listing the files and directories

2.3. Upload a file in HDFS

Command is used to copy one or multiple files from local system to the Hadoop File System.

Usage:

hadoop fs -put <local_files> ... <hdfs_path>

Example:

hadoop fs -put Desktop/testfile.txt /user/root/dir1/

In the screenshot below, we put the file testfile.txt from Desktop of the Local File System to the Hadoop File System at the destiantion /user/root/dir1

Uploading the file to Hadoop FS
Uploading the file to Hadoop FS

2.4. Download a file from HDFS

Download the file from HDFS to the local file system.

Usage:

hadoop fs -get <hdfs_paths> <local_path>

Example:

hadoop fs -get /user/root/dir1/testfile.txt Downloads/

As with the put command, get command gets or downloads the file from Hadoop File System to the Local File System in the Downloads folder.

Download the file from Hadoop FS
Download the file from Hadoop FS

2.5. View the file content

For viewing the content of the file, cat command is available in the Hadoop File System. It is again similar to the one available in the unix shell.

Following is the content of the file which is uploaded to the Hadoop file system at the path /user/root/dir1/ in the previous steps.

Testfile.txt
Testfile.txt

Usage:

hadoop fs -cat <paths>

Example:

hadoop fs -cat /user/root/dir1/testfile.txt

We can see that the content displayed in the screenshot below is same as the content in the testfile.txt

Hadoop FS cat command
Hadoop FS cat command

2.6. Copying a file

Copying a file from one place to another within the Hadoop File System is same syntax as cp command in unix shell.

Usage:

hadoop fs -cp <source_path> ... <destination_path>

Example:

hadoop fs -cp /user/root/dir1/testfile.txt /user/root/dir2

In copying file from source to destination, we can provide multiple files in source also.

Copying Hadoop FS file from one place to another
Copying Hadoop FS file from one place to another

2.7. Moving file from source to destination

Following is the syntax and the example to move the file from one directory to another within Hadoop File System.

Usage:

hadoop fs -mv <source_path> <destination_path>

Example:

hadoop fs -mv /user/root/dir1/testfile.txt /user/root/dir2

Moving file from one path to another
Moving file from one path to another

2.8. Removing the file or the directory from HDFS

Removing a file or directory from the Hadoop File System is similar to the unix shell. It also have two alternatives, -rm and -rm -r

Usage:

hadoop fs -rm <path>

Example:

hadoop fs -rm /user/root/dir2/testfile.txt

The above command will only delete the particular file or in case of directory, only if it is empty. But if we want to delete a directory which contains other file, we have a recursive version of the remove command also.

Removing file from Hadoop FS
Removing file from Hadoop FS

In case, we want to delete a directory which contains files, -rm will not be able to delete the directory. In that case we can use recursive option for removing all the files from the directory following by removing the directory when it is empty. Below is the example of the recursive operation:

Usage:

hadoop fs -rm -r <path>

Example:

hadoop fs -rm -r /user/root/dir2

Removing the file recursively
Removing the file recursively

2.9. Displaying the tail of a file

The command is exactly similar to the unix tail command.

Usage:

hadoop fs -tail <path>

Example:

hadoop fs -tail /user/root/dir1/testfile.txt

Tail command for Hadoop FS file.
Tail command for Hadoop FS file.

2.10. Displaying the aggregate length of a particular file

In order to check the aggregate length of the content in a file, we can use -du. command as below. If the path is of the file, then the length of the file is shown and if it is the path to the directory, then the aggregated size of the content if shown is shown including all files and directories.

Usage:

hadoop fs -du <path>

Example:

hadoop fs -du /user/root/dir1/testfile.txt

Hadoop Fs Aggregated Length
Hadoop Fs Aggregated Length

2.11. Count the directories and files

This command is to count the number of files and directories under the specified path. As in the following screenshot, the output shows the number of directories i.e. 2, number of files i.e. 1, the total content size which is 159 bytes and the path to which these stats belong to.

hadoop fs -count <path>

Example:

hadoop fs -count /user/root/

Count command output
Count command output

2.12. Details of space in the file system

To get all the space related details of the Hadoop File System we can use df command. It provides the information regarding the amount of space used and amount of space available on the currently mounted filesystem

hadoop fs -df <path>

Command can be used without the path URI or with the path URI, when used without the path URI, it provides the information regarding the whole file system. When path URI id provided it provides the information specific to the path.

Example:

hadoop fs -df
hadoop fs -df /user/root

Following screenshot displays the Filesystem, Size of the filesystem, Used Space, Available Space and the Used percentage.

DF command output
DF command output

3. Conclusion

This brings us to the conclusion of the example. These Hadoop File System commands will help you in getting a head start in dealing with the files and directories in the Hadoop Ecosystem.

Raman Jhajj

Ramaninder has graduated from the Department of Computer Science and Mathematics of Georg-August University, Germany and currently works with a Big Data Research Center in Austria. He holds M.Sc in Applied Computer Science with specialization in Applied Systems Engineering and minor in Business Informatics. He is also a Microsoft Certified Processional with more than 5 years of experience in Java, C#, Web development and related technologies. Currently, his main interests are in Big Data Ecosystem including batch and stream processing systems, Machine Learning and Web Applications.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button