Hadoop: Add a New DataNode

DataNode:
Use rsync from one of the other datanodes you previously setup. Ensure you change datanode specific settings you configured during installation.

 hadoop-daemon.sh start datanode
start-yarn.sh

NameNode:

 nano /usr/local/hadoop/etc/hadoop/slaves

Add the new slave hostname

 hadoop dfsadmin –refreshNodes

Refreshes all the nodes you have without doing a full restart

When you add a new datanode no data will exist so you can rebalance the cluster to what makes sense in your environment.

 hdfs balancer –threshold 1 –include ALL_DATA_NODES_HOSTNAME_SEPERATED_BY_COMMA