Welcome to Apache Hudi !

Apache Hudi ingests & manages storage of large analytical datasets over DFS (hdfs or cloud stores).
Latest release 0.8.0

Get Started


Hudi Data Lakes

Hudi brings stream processing to big data, providing fresh data while being an order of magnitude efficient over traditional batch processing.

Hudi Features

Upsert support with fast, pluggable indexing.

Atomically publish data with rollback support.

Snapshot isolation between writer & queries.

Savepoints for data recovery.

Manages file sizes, layout using statistics.

Async compaction of row & columnar data.

Timeline metadata to track lineage.

Optimize data lake layout with clustering.