Apache Hadoop — How to install and configure a cluster on Ubuntu 18.04

Hi people!!

In this tutorial, we will install and configure Hadoop on Ubuntu with single-node only.

But before we start, we need the following requirements:

  • Ubuntu 18.04 (Can be used on a virtualized machine in VirtualBox);
  • Java installed;

After installing ubuntu, to make sure you have Java installed, use the following command:

If not, use the following commands to install:

Now you already have Java installed

1st step: Hadoop user configuration

To begin, let’s set up a new group of Hadoop users.

And we’re going to add a Hadoop user with the name hadoopuser

2nd step: Install and configure OpenSSH

We will install OpenSSH using the following command:

Hadoop uses SSH to access nodes. In this case and since we are making a configuration for only a single-node, we need to configure SSH to access localhost.

We will log in with the user we created

Next, we will generate an SSH public key for the hadoopuser

Let’s add the key we generated earlier to the list of authorized_keys

In order to make sure SSH is working, we will use the following command:

Finally, use the command “exit” to close the connection.

3rd step: Install and configure Hadoop

We will now install Hadoop version 2.9.1

Let’s unzip the Hadoop folder

After unzipping, we will move it to the following directory:

We will assign ownership of the ‘hadoop’ folder to the hadoopuser user

We then move on to the configuration of several files.

To start the configuration we will use the following command to open the bashrc file:

Copy the following settings to the end of the file:

Save the file and use the following command:

Now let’s edit the file hadoop-env.sh and define the JAVA_HOME variable

And put the following configuration:

Now, let’s set up a simple-node for a cluster

For this we will use the following files:

  • core-site.xml

Put the following property in the file, within the configuration:

  • hdfs-site.xml

Put the following properties in the file:

  • yarn-site.xml

Put the following properties:

  • mapred-site.xml

Although the file name is mapred-site.xml.template, we will rename the file to mapred-site.xml, according to the following command:

Then, let’s open the file to edit it:

Place the following property:

Now, let’s create the directories for the namenode and the datanode, using the following commands:

Now, let’s format the namenode, first assigning ownership of the hadoop_space folder to the hadoopuser user, using the following command:

Next we will start the hadoop services:

To verify that all services started correctly, run the following command:

To finish, just access the following URL:

http://localhost: 8088/cluster

And this is how a single node cluster is created.

Finalist student in Computer Engineering