Apache Hadoop — How to implement a Multi Node Distributed Plataform

  • Ubuntu 18.04 version (Can be used on a virtualized machine in VirtualBox);
  • Java 8 installed.

1st step: Install ssh

sudo apt install ssh

2nd step: Install pdsh

sudo apt install pdsh

3rd step: Configure .bashrc

nano .bashrc
export PDSH_RCMD_TYPE=ssh

4th step: Generate an ssh key

ssh-keygen -t rsa -P ""

5th step: Copy the generated key

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

6th step: Check the ssh configuration

ssh localhost

7th step: Install Java version 8

sudo apt install openjdk-8jdk

8th step: Verify the Java installation

java -version

9th step: Install Hadoop

sudo wget -P /home/arga/Desktop https://mirrors.sonic.net/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz

10th step: Unzip Hadoop

cd Desktop
tar xzf hadoop-3.2.1.tar.gz

11th step: Rename the folder

mv hadoop-3.2.1 hadoop

12th step: Configure the hadoop-env.sh file

cd hadoop
cd etc
cd hadoop
nano hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/

13th step: Change the Hadoop folder directory

sudo mv hadoop /usr/local/hadoop

14th step: Configure the environment file

sudo nano /etc/environment

15th step: Create a Hadoop user

sudo adduser hadoopuser

16th step: Use the following commands

sudo usermod -aG hadoopuser hadoopuser
sudo chown hadoopuser:root -R /usr/local/hadoop/
sudo chmod g+rwx -R /usr/local/hadoop/
sudo adduser hadoopuser sudo

17th step: Check the IP Address

ip addr
  • master:
  • slave1:
  • slave2:

18th step: Configure the hosts file

sudo nano /etc/hosts

19th step: Create slaves

20th step: Configure the hostname file

sudo nano /etc/hostname
sudo reboot

21st step: Configure SSH in the master

su - hadoopuser

22nd step: Generate (again) an ssh key

ssh-keygen -t rsa
ssh-copy-id hadoopuser@master
ssh-copy-id hadoopuser@slave1
ssh-copy-id hadoopuser@slave2

23rd step: Configure core-site.xml — master

sudo nano /usr/local/hadoop/etc/hadoop/core-site.xml

24th step: Configure hdfs-site.xml — master

sudo nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

25th step: Configure workers file — master

sudo nano /usr/local/hadoop/etc/hadoop/workers

26th step: Copy the master configuration files to the slaves

scp /usr/local/hadoop/etc/hadoop/* slave1:/usr/local/hadoop/etc/hadoop/
scp /usr/local/hadoop/etc/hadoop/* slave2:/usr/local/hadoop/etc/hadoop/

27th step: Format HDFS files

source /etc/environment
hdfs namenode -format

28th step: Start dfs


29th step: Configure yarn

export HADOOP_HOME="/usr/local/hadoop"
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

30th step: Configure yarn-site.xml — slaves

sudo nano /usr/local/hadoop/etc/hadoop/yarn-site.xml

31st step: Start yarn


The final output




Finalist student in Computer Engineering

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

The Standard of Code Review at Sixfold — Effective Communication of Feedback

Why Microservices Should Scare You More

Helm 101 for Developers

Marketing Analytics in Telecom: The Shift to the Cloud and Google Cloud Platform

Being LGBTQ+ In the Tech Industry

Using S3 Batch Operations with Lambda in Scala

What Are Key Features To Develop App Like FilterBox?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Afonso Antunes

Afonso Antunes

Finalist student in Computer Engineering

More from Medium

Consistency, Availability, Partition Tolerance Or CAP Theorem

Apache HBase : RegionServers co-location

Cinchoo ETL — Split a large JSON file based on deeply nested array property

Exchange Seats Problem: Solution without Subquery