Skip to main content
bg-image

Apache HudiTM brings

to data lakes
hudi-logo

What is Hudi 

Apache Hudi is an open data lakehouse platform, built on a high-performance open table format to bring database functionality to your data lakes. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics.
Hudi Data LakeHudi Data Lake

Integrations 

Data Streaming
Apache Kafka
Apache Kafka
Apache Pulsar
Apache Pulsar
Databases
PostgreSQL
PostgreSQL
MySQL
MySQL
CDC
Debezium
Debezium
Apache Flink CDC
Apache Flink CDC
File Formats
Apache Parquet
Apache Parquet
Apache ORC
Apache ORC
Apache Avro
Apache Avro
CSV
CSV
JSON
JSON
Lake Storage
Apache Hadoop
Apache Hadoop
Amazon S3
Amazon S3
Google Cloud Storage
Google Cloud Storage
Azure Blob Storage
Azure Blob Storage
Alibaba Cloud
Alibaba Cloud
IBM Cloud
IBM Cloud
Oracle Cloud
Oracle Cloud
Tencent Cloud
Tencent Cloud
MinIO
MinIO
Data Catalogs
AWS Glue Data Catalog
AWS Glue Data Catalog
Google BigQuery
Google BigQuery
Apache Hive Metastore
Apache Hive Metastore
DataHub
DataHub
Apache XTable (Incubating) (For sync)
Apache XTable (Incubating) (For sync)
Data Warehouses
Amazon Redshift
Amazon Redshift
ClickHouse
ClickHouse
Interactive Analytics
Presto
Presto
Trino
Trino
Apache Hive
Apache Hive
AWS Athena
AWS Athena
Google BigQuery
Google BigQuery
Apache Doris
Apache Doris
StarRocks
StarRocks
Apache Impala
Apache Impala
Data Processing
Apache Spark
Apache Spark
Apache Flink
Apache Flink
Databricks
Databricks
AWS EMR
AWS EMR
Azure HDInsight
Azure HDInsight
Onehouse
Onehouse
Ray
Ray
Daft
Daft
Orchestration
dbt
dbt
Apache Airflow
Apache Airflow

Hudi Features 

Why Hudi 

The most innovative and completely open data lakehouse platform in the industry!

Trusted Platform

Battle tested and proven in production in some of the largest data lakes on the planet.

Open Source

Hudi is a thriving & growing community that is built with contributions from people around the globe.

High Performance

Hudi's storage format is purpose-built to continuously deliver performance as data scales.

Data streams

Take advantage of built-in CDC sources and tools for streaming ingestion.

Join our Community 

Get technical help, influence the product roadmap & see what’s new with Hudi!

Youtube

Linkedin

GitHub

Slack

Mailing

X