This repository provides a simple and efficient way to set up a distributed Hadoop HDFS (Hadoop Distributed File System) cluster using Docker containers. The setup includes one NameNode (the master) and two DataNodes (the workers) to simulate a basic HDFS environment for testing or learning purposes.
- Docker Engine (version 19.03.0+)
- Docker Compose (version 1.27.0+)
- At least 4GB of RAM available
- At least 10GB of free disk space
The setup consists of three key components:
- 1 NameNode: Acts as the master node managing the filesystem's metadata, serving the HDFS Web UI on port 9870 and the NameNode - service on port 9000.
- 2 DataNodes: Two worker nodes responsible for storing data in the HDFS. Each DataNode has its own Web UI for monitoring.
- 3 Volumes: Persistent volumes for storing HDFS data, ensuring that data is preserved even if the containers are restarted.
- NameNode Web UI: 9870
- NameNode Service: 9000
- DataNode 1 Web UI: 9864
- DataNode 2 Web UI: 9865
- Clone this repository:
git clone https://github.com/L00kAhead/hadoop_cluster
cd hadoop_cluster
- Start the cluster:
docker-compose up -d
- Verify the cluster is running:
docker ps
- Access the NameNode Web UI:
http://localhost:9870
The docker-compose.yml
file defines the following services:
- namenode: The HDFS master node
- datanode1: First HDFS worker node
- datanode2: Second HDFS worker node
Basic HDFS configuration file specifying the NameNode address:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode:9000</value>
</property>
</configuration>
Contains essential Hadoop configuration parameters:
- HDFS service configuration
- Permission settings
- NameNode and DataNode directory paths
- Handler count settings
The following persistent volumes are configured:
hadoop_namenode
: Stores NameNode metadatahadoop_datanode1
: Stores data for first DataNodehadoop_datanode2
: Stores data for second DataNode
# List running containers
docker ps
# Check NameNode logs
docker logs namenode
# Check DataNode logs
docker logs datanode1
docker logs datanode2
# Enter NameNode container
docker exec -it namenode bash
# Basic HDFS commands
hdfs dfs -ls /
hdfs dfs -mkdir /user
hdfs dfs -put localfile /user/
To add more DataNodes:
- Copy the datanode service configuration in docker-compose.yml
- Update the container name and volume mapping
- Add new volume definition
- Restart the cluster
docker-compose down # Stops containers
docker-compose down -v # Stops containers and removes volumes
-
If DataNodes are not connecting:
- Check if NameNode is properly running
- Verify network connectivity
- Check logs using
docker logs <container-name>
-
If Web UI is not accessible:
- Verify port mappings
- Check if containers are running
- Ensure no port conflicts
- Default configuration disables permissions (
HDFS_CONF_dfs_permissions_enabled=false
) - Web HDFS is enabled by default
- Root user is configured as static user