HDFS in Practice
Explore how to interact with the Hadoop Distributed File System (HDFS) through practical command line operations. Understand how to create directories, upload files, list contents, and retrieve data from HDFS, gaining hands-on experience with file management in Big Data environments.
We'll cover the following...
HDFS in Practice
So far, we covered the theory behind HDFS, its different components and a higher level of understanding on its working. Now we delve into a hands-on exercise interacting with HDFS in a pseudo-distributed cluster running in a docker container. The path environment variable has been set correctly so that the hdfs executable is available. The hdfs command line utility exposes three kinds of commands:
- Admin commands
- Client commands
- Daemon commands
Client commands are the most commonly used. Admin and daemon commands are usually used by Hadoop administrators. Our overview of the commands isn’t a comprehensive study of all the commands and features exposed by the hdfs utility. Rather, it gives the user enough familiarity to find their way for performing various, necessary tasks. Let’s start!
-
Start by executing hdfs on the command line. Take a minute to observe the output.
hdfsYou’ll see a long list of commands and their usage. We’ll examine the dfs subcommand under client commands in this lesson. The dfs subcommand can interact with the filesystem.
-
We’ll start with listing the root path of HDFS. Execute:
hdfs dfs -ls /The output shows that there’s only the tmp directory at the root of HDFS.
-
Let’s create a directory using the following command:
hdfs dfs -mkdir -p /MyDirectory -
Next, we’ll upload a file residing on the local filesystem of the node we are running to HDFS.
hdfs dfs -copyFromLocal /DataJek/helloWorld.txt /MyDirectoryThe
-copyFromLocaloption instructs the hdfs executable to look for the file on the local filesystem path/DataJek/helloWorld.txtand upload it to HDFS filesystem path /MyDirectory. -
Let’s verify that the file has been uploaded.
hdfs dfs -ls /MyDirectoryOur
helloWorld.txthas been successfully uploaded!
-
Next, we can check the size of our directory
MyDirectoryas follows:hdfs dfs -du -s -h -v /MyDirectory
-
We can view the contents of the helloWorld.txt file as follows:
hdfs dfs -cat /MyDirectory/helloWorld.txt
-
We can use a different command text to view the contents of the
helloWorld.txtfile.hdfs dfs -text /MyDirectory/helloWorld.txt
-
Next, we’ll learn how to download a file from HDFS to the local filesystem.
hdfs dfs -copyToLocal /MyDirectory/helloWorld.txt /Downloads/
-
We can also run a find command on the HDFS namespace:
hdfs dfs -find / -iname "hello*"
Namenode Webserver
The Namenode exposes a web UI accessed at the port specified by the property dfs.namenode.http-address which, by default, is set to 9870.
http://localhost:9870/dfshealth.html
A screenshot of the UI appears below:
We reproduced the widget from the Namenode lesson below, in case, you missed interacting with it.
Again, the UI will not load in the widget below. Click on the URL link beside the message “Your app can be found at” or wait for the Firefox message to load “Open Site in New Window”, and click on that. The Namenode UI may be slow to load, so please be patient.