Apache Hadoop — How to install and configure a cluster on Ubuntu 18.04

  • Ubuntu 18.04 (Can be used on a virtualized machine in VirtualBox);
  • Java installed;
java -version
sudo apt-get install default-jre
sudo apt-get install default-jdk

1st step: Hadoop user configuration

sudo addgroup hadoop
sudo adduser -–ingroup hadoop hadoopuser
sudo adduser hadoopuser sudo

2nd step: Install and configure OpenSSH

sudo apt-get install openssh-server
su - hadoopuser
ssh-keygen -t rsa -P ""
cat $HOME/ .ssh/id_rsa.pub >> $HOME/ .ssh/authorized_keys
ssh localhost

3rd step: Install and configure Hadoop

sudo wget -P /home/arga/Desktop/ https://archive.apache.org/dist/hadoop/common/hadoop-2.9.1/hadoop-2.9.1.tar.gz
cd /home/arga/Desktop
sudo tar xvzf hadoop-2.9.1.tar.gz
sudo mv hadoop-2.9.1 /usr/local/hadoop
sudo chown -R hadoopuser /usr/local
sudo gedit ~/.bashrc
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS=""
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
source ~/.bashrc
cd /usr/local/hadoop/
sudo nano hadoop-env.sh

Now, let’s set up a simple-node for a cluster

  • core-site.xml
sudo nano core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
  • hdfs-site.xml
sudo nano hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_tmp/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_tmp/hdfs/datanode</value>
</property>
  • yarn-site.xml
sudo nano yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
  • mapred-site.xml
sudo cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
sudo nano mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
cd
mkdir -p /usr/local/hadoop_space/hdfs/namenode
mkdir -p /usr/local/hadoop_space/hdfs/datanode
sudo chown -R hadoopuser /usr/local/hadoop_space
cd
hdfs namenode -format
start-all.sh
jps

And this is how a single node cluster is created.

--

--

--

Finalist student in Computer Engineering

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

My Development Toolbox

What’s new in C# 9

Building a Simple CRUD Application using ASP.NET Core 3.0 Web API

Stop installing CLI tools on your build server — CLI-as-a-Function with OpenFaaS

READ/DOWNLOAD*+ SharePoint For Dummies (For Dummie

A Deep Dive on AWS CloudFormation Custom Resources

How to download image data (bytes) using ASP.NET

Guideline Test

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Afonso Antunes

Afonso Antunes

Finalist student in Computer Engineering

More from Medium

Installing Spark/PySpark on Mac-OS Monterey with Anaconda/MiniConda/VS Code

Getting started with streaming

An Introduction to BIG DATA

A Gentle (theoretical) introduction to Kafka