Apache Hudi ingests & manages storage of large analytical datasets over DFS (hdfs or cloud stores). Latest release 0.8.0
Hudi brings stream processing to big data, providing fresh data while being an order of magnitude efficient over traditional batch processing.
Upsert support with fast, pluggable indexing.
Atomically publish data with rollback support.
Snapshot isolation between writer & queries.
Savepoints for data recovery.
Manages file sizes, layout using statistics.
Async compaction of row & columnar data.
Timeline metadata to track lineage.
Optimize data lake layout with clustering.