Skip to main content

Release 0.12.1 (docs)

Migration Guide

  • This release (0.12.1) does not introduce any new table version, thus no migration is needed if you are on 0.12.0.
  • If migrating from an older release, please check the migration guide from the previous release notes, specifically the upgrade instructions in 0.6.0, 0.9.0, 0.10.0, 0.11.0, and 0.12.0.

Release Highlights

Improve Hudi Cli

Add command to repair deprecated partition, rename partition and trace file group through a range of commits.

Fix invalid record key stats in Parquet metadata

Crux of the problem was that min/max statistics for the record keys were computed incorrectly during (Spark-specific) row-writing Bulk Insert operation affecting Key Range Pruning flow w/in Hoodie Bloom Index tagging sequence, resulting into updated records being incorrectly tagged as "inserts" and not as "updates", leading to duplicated records in the table.

If all of the following is applicable to you:

  1. Using Spark as an execution engine
  2. Using Bulk Insert (using row-writing https://hudi.apache.org/docs/next/configurations#hoodiedatasourcewriterowwriterenable, enabled by default)
  3. Using Bloom Index (with range-pruning https://hudi.apache.org/docs/next/basic_configurations/#hoodiebloomindexprunebyranges enabled, enabled by default) for "UPSERT" operations

Recommended to upgrading to 0.12.1 to avoid getting duplicate records in your pipeline.

Bug fixes

0.12.1 release is mainly intended for bug fixes and stability. The fixes span across many components, including

  • DeltaStreamer
  • Table config
  • Table services
  • Metadata table
  • Spark SQL support
  • Presto support
  • Hive Sync
  • Flink engine
  • Unit, functional, integration tests and CI

Raw Release Notes

The raw release notes are available here