Docker Demo
A Demo using Docker containers
Let's use a real world example to see how Hudi works end to end. For this purpose, a self contained data infrastructure is brought up in a local Docker cluster within your computer. It requires the Hudi repo to have been cloned locally.
The steps have been tested on a Mac laptop
Prerequisites
-
Clone the Hudi repository to your local machine.
-
Docker Setup : For Mac, Please follow the steps as defined in Install Docker Desktop on Mac. For running Spark-SQL queries, please ensure atleast 6 GB and 4 CPUs are allocated to Docker (See Docker -> Preferences -> Advanced). Otherwise, spark-SQL queries could be killed because of memory issues.
-
kcat : A command-line utility to publish/consume from kafka topics. Use
brew install kcat
to install kcat. -
/etc/hosts : The demo references many services running in container by the hostname. Add the following settings to /etc/hosts
127.0.0.1 adhoc-1
127.0.0.1 adhoc-2
127.0.0.1 namenode
127.0.0.1 datanode1
127.0.0.1 hiveserver
127.0.0.1 hivemetastore
127.0.0.1 kafkabroker
127.0.0.1 sparkmaster
127.0.0.1 zookeeper -
Java : Java SE Development Kit 8.
-
Maven : A build automation tool for Java projects.
-
jq : A lightweight and flexible command-line JSON processor. Use
brew install jq
to install jq.
Also, this has not been tested on some environments like Docker on Windows.
Setting up Docker Cluster
Build Hudi
The first step is to build Hudi. Note This step builds Hudi on default supported scala version - 2.11.
NOTE: Make sure you've cloned the Hudi repository first.
cd <HUDI_WORKSPACE>
mvn clean package -Pintegration-tests -DskipTests
Bringing up Demo Cluster
The next step is to run the Docker compose script and setup configs for bringing up the cluster. These files are in the Hudi repository which you should already have locally on your machine from the previous steps.
This should pull the Docker images from Docker hub and setup the Docker cluster.
- Default
- Mac AArch64
cd docker
./setup_demo.sh
....
....
....
[+] Running 10/13
⠿ Container zookeeper Removed 8.6s
⠿ Container datanode1 Removed 18.3s
⠿ Container trino-worker-1 Removed 50.7s
⠿ Container spark-worker-1 Removed 16.7s
⠿ Container adhoc-2 Removed 16.9s
⠿ Container graphite Removed 16.9s
⠿ Container kafkabroker Removed 14.1s
⠿ Container adhoc-1 Removed 14.1s
⠿ Container presto-worker-1 Removed 11.9s
⠿ Container presto-coordinator-1 Removed 34.6s
.......
......
[+] Running 17/17
⠿ adhoc-1 Pulled 2.9s
⠿ graphite Pulled 2.8s
⠿ spark-worker-1 Pulled 3.0s
⠿ kafka Pulled 2.9s
⠿ datanode1 Pulled 2.9s
⠿ hivemetastore Pulled 2.9s
⠿ hiveserver Pulled 3.0s
⠿ hive-metastore-postgresql Pulled 2.8s
⠿ presto-coordinator-1 Pulled 2.9s
⠿ namenode Pulled 2.9s
⠿ trino-worker-1 Pulled 2.9s
⠿ sparkmaster Pulled 2.9s
⠿ presto-worker-1 Pulled 2.9s
⠿ zookeeper Pulled 2.8s
⠿ adhoc-2 Pulled 2.9s
⠿ historyserver Pulled 2.9s
⠿ trino-coordinator-1 Pulled 2.9s
[+] Running 17/17
⠿ Container zookeeper Started 41.0s
⠿ Container kafkabroker Started 41.7s
⠿ Container graphite Started 41.5s
⠿ Container hive-metastore-postgresql Running 0.0s
⠿ Container namenode Running 0.0s
⠿ Container hivemetastore Running 0.0s
⠿ Container trino-coordinator-1 Runni... 0.0s
⠿ Container presto-coordinator-1 Star... 42.1s
⠿ Container historyserver Started 41.0s
⠿ Container datanode1 Started 49.9s
⠿ Container hiveserver Running 0.0s