Apache Hadoop — How to install and configure a cluster on Ubuntu 18.04

Afonso Antunes

4 min readMay 28, 2021

Hi people!!

In this tutorial, we will install and configure Hadoop on Ubuntu with single-node only.

But before we start, we need the following requirements:

Ubuntu 18.04 (Can be used on a virtualized machine in VirtualBox);
Java installed;

After installing ubuntu, to make sure you have Java installed, use the following command:

java -version

If not, use the following commands to install:

sudo apt-get install default-jre
sudo apt-get install default-jdk

Now you already have Java installed

1st step: Hadoop user configuration

To begin, let’s set up a new group of Hadoop users.

sudo addgroup hadoop

And we’re going to add a Hadoop user with the name hadoopuser

sudo adduser -–ingroup hadoop hadoopuser
sudo adduser hadoopuser sudo

2nd step: Install and configure OpenSSH

We will install OpenSSH using the following command:

sudo apt-get install openssh-server

Hadoop uses SSH to access nodes. In this case and since we are making a configuration for only a single-node, we need to configure SSH to access localhost.

We will log in with the user we created

su - hadoopuser

Next, we will generate an SSH public key for the hadoopuser

ssh-keygen -t rsa -P ""

Let’s add the key we generated earlier to the list of authorized_keys

cat $HOME/ .ssh/id_rsa.pub >> $HOME/ .ssh/authorized_keys

In order to make sure SSH is working, we will use the following command:

ssh localhost

Finally, use the command “exit” to close the connection.

3rd step: Install and configure Hadoop

We will now install Hadoop version 2.9.1

sudo wget -P /home/arga/Desktop/ https://archive.apache.org/dist/hadoop/common/hadoop-2.9.1/hadoop-2.9.1.tar.gz

Let’s unzip the Hadoop folder

cd /home/arga/Desktop
sudo tar xvzf hadoop-2.9.1.tar.gz

After unzipping, we will move it to the following directory:

sudo mv hadoop-2.9.1 /usr/local/hadoop

We will assign ownership of the ‘hadoop’ folder to the hadoopuser user

sudo chown -R hadoopuser /usr/local

We then move on to the configuration of several files.

To start the configuration we will use the following command to open the bashrc file:

sudo gedit ~/.bashrc

Copy the following settings to the end of the file:

export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS=""
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar

Save the file and use the following command:

source ~/.bashrc

Now let’s edit the file hadoop-env.sh and define the JAVA_HOME variable

cd /usr/local/hadoop/
sudo nano hadoop-env.sh

And put the following configuration:

Now, let’s set up a simple-node for a cluster

For this we will use the following files:

core-site.xml

sudo nano core-site.xml

Put the following property in the file, within the configuration:

<property>
 <name>fs.default.name</name>
 <value>hdfs://localhost:9000</value>
</property>

hdfs-site.xml

sudo nano hdfs-site.xml

Put the following properties in the file:

<property>
 <name>dfs.replication</name>
 <value>1</value>
</property><property>
 <name>dfs.namenode.name.dir</name>
 <value>file:/usr/local/hadoop_tmp/hdfs/namenode</value>
</property><property>
 <name>dfs.datanode.data.dir</name>
 <value>file:/usr/local/hadoop_tmp/hdfs/datanode</value>
</property>

yarn-site.xml

sudo nano yarn-site.xml

Put the following properties:

<property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
</property><property>
 <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>               
 <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

mapred-site.xml

Although the file name is mapred-site.xml.template, we will rename the file to mapred-site.xml, according to the following command:

sudo cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml

Then, let’s open the file to edit it:

sudo nano mapred-site.xml

Place the following property:

<property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
</property>

Now, let’s create the directories for the namenode and the datanode, using the following commands:

cd
mkdir -p /usr/local/hadoop_space/hdfs/namenode
mkdir -p /usr/local/hadoop_space/hdfs/datanode

Now, let’s format the namenode, first assigning ownership of the hadoop_space folder to the hadoopuser user, using the following command:

sudo chown -R hadoopuser /usr/local/hadoop_space
cd
hdfs namenode -format

Next we will start the hadoop services:

start-all.sh

To verify that all services started correctly, run the following command:

jps

To finish, just access the following URL:

http://localhost: 8088/cluster

And this is how a single node cluster is created.