Skip to main content

Release 0.10.1 (docs)

Migration Guide

  • This release (0.10.1) does not introduce any new table version, hence no migration needed if you are on 0.10.0.
  • If migrating from an older release, please check the migration guide from the previous release notes, specifically the upgrade instructions in 0.6.0, 0.9.0 and 0.10.0.

Release Highlights

Explicit Spark 3 bundle names

In the previous release (0.10.0), we added Spark 3.1.x support and made it the default Spark 3 version to build with. In 0.10.1, we made the Spark 3 version explicit in the bundle name and published a new bundle for Spark 3.0.x. Specifically, these 2 bundles are available in the public maven repository.

  • hudi-spark3.1.2-bundle_2.12-0.10.1.jar
  • hudi-spark3.0.3-bundle_2.12-0.10.1.jar

Repair Utility

We added a new repair utility org.apache.hudi.utilities.HoodieRepairTool to clean up dangling base and log files. The utility can be run as a separate Spark job as below.

spark-submit \
--class org.apache.hudi.utilities.HoodieRepairTool \
--driver-memory 4g \
--executor-memory 1g \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.sql.catalogImplementation=hive \
--conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension \
--packages org.apache.spark:spark-avro_2.12:3.1.2 \
$HUDI_DIR/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.11.0-SNAPSHOT.jar \
--mode dry_run \
--base-path base_path \
--assume-date-partitioning

Check out the javadoc in HoodieRepairTool for more instructions and examples.

Bug fixes

0.10.1 is mainly intended for bug fixes and stability. The fixes span across many components, including

  • HoodieDeltaStreamer
  • Timeline related fixes
  • Table services
  • Metadata table
  • Spark SQL support
  • Timestamp-based key generator
  • Hive Sync
  • Flink and Java engines
  • Kafka Connect

Raw Release Notes

The raw release notes are available here