- This release (0.10.1) does not introduce any new table version, hence no migration needed if you are on 0.10.0.
- If migrating from an older release, please check the migration guide from the previous release notes, specifically the upgrade instructions in 0.6.0, 0.9.0 and 0.10.0.
Explicit Spark 3 bundle names
In the previous release (0.10.0), we added Spark 3.1.x support and made it the default Spark 3 version to build with. In 0.10.1, we made the Spark 3 version explicit in the bundle name and published a new bundle for Spark 3.0.x. Specifically, these 2 bundles are available in the public maven repository.
We added a new repair utility
org.apache.hudi.utilities.HoodieRepairTool to clean up dangling base and log files. The utility
can be run as a separate Spark job as below.
--class org.apache.hudi.utilities.HoodieRepairTool \
--driver-memory 4g \
--executor-memory 1g \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.sql.catalogImplementation=hive \
--conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension \
--packages org.apache.spark:spark-avro_2.12:3.1.2 \
--mode dry_run \
--base-path base_path \
Check out the javadoc in
HoodieRepairTool for more instructions and examples.
0.10.1 is mainly intended for bug fixes and stability. The fixes span across many components, including
- Timeline related fixes
- Table services
- Metadata table
- Spark SQL support
- Timestamp-based key generator
- Hive Sync
- Flink and Java engines
- Kafka Connect
Raw Release Notes
The raw release notes are available here