Releases

Release 0.5.2-incubating (docs)

Download Information

Migration Guide for this release

  • Write Client restructuring has moved classes around (HUDI-554). Package client now has all the various client classes, that do the transaction management. func renamed to execution and some helpers moved to client/utils. All compaction code under io now under table/compact. Rollback code under table/rollback and in general all code for individual operations under table. This change only affects the apps/projects depending on hudi-client. Users of deltastreamer/datasource will not need to change anything.

Release Highlights

  • Support for overwriting the payload implementation in hoodie.properties via specifying the hoodie.compaction.payload.class config option. Previously, once the payload class is set once in hoodie.properties, it cannot be changed. In some cases, if a code refactor is done and the jar updated, one may need to pass the new payload class name.
  • TimestampBasedKeyGenerator supports for CharSequence types. Previously TimestampBasedKeyGenerator only supports Double, Long, Float and String 4 data types for the partition key. Now, after data type extending, CharSequence has been supported in TimestampBasedKeyGenerator.
  • Hudi now supports incremental pulling from defined partitions via the hoodie.datasource.read.incr.path.glob config option. For some use case that users only need to pull the incremental part of certain partitions, it can run faster by only loading relevant parquet files.
  • With 0.5.2, hudi allows partition path to be updated with GLOBAL_BLOOM index. Previously, when a record is to be updated with a new partition path, and when set to GLOBAL_BLOOM as index, hudi ignores the new partition path and update the record in the original partition path. Now, hudi allows records to be inserted into their new partition paths and delete the records in the old partition paths. A configuration (e.g. hoodie.index.bloom.update.partition.path=true) can be added to enable this feature.
  • A JdbcbasedSchemaProvider schema provider has been provided to get metadata through JDBC. For the use case that users want to synchronize data from MySQL, and at the same time, want to get the schema from the database, it’s very helpful.
  • Simplify HoodieBloomIndex without the need for 2GB limit handling. Prior to spark 2.4.0, each spark partition has a limit of 2GB. In Hudi 0.5.1, after we upgraded to spark 2.4.4, we don’t have the limitation anymore. Hence removing the safe parallelism constraint we had in HoodieBloomIndex.
  • CLI related changes:
    • Allows users to specify option to print additional commit metadata, e.g. Total Log Blocks, Total Rollback Blocks, Total Updated Records Compacted and so on.
    • Supports temp_query and temp_delete to query and delete temp view. This command creates a temp table. Users can write HiveQL queries against the table to filter the desired row. For example,
      temp_query --sql "select Instant, NumInserts, NumWrites from satishkotha_debug where FileId='ed33bd99-466f-4417-bd92-5d914fa58a8f' and Instant > '20200123211217' order by Instant"
      

Raw Release Notes

The raw release notes are available here

Release 0.5.1-incubating (docs)

Download Information

Migration Guide for this release

  • In 0.5.1, the community restructured the package of key generators. The key generator related classes have been moved from org.apache.hudi to org.apache.hudi.keygen.

Release Highlights

  • Package and format renaming from com.uber.hoodie to org.apache.hudi (See migration guide section below)
  • Major redo of Hudi bundles to address class and jar version mismatches in different environments
  • Upgrade from Hive 1.x to Hive 2.x for compile time dependencies - Hive 1.x runtime integration still works with a patch : See the discussion thread
  • DeltaStreamer now supports continuous running mode with managed concurrent compaction
  • Support for Composite Keys as record key
  • HoodieCombinedInputFormat to scale huge hive queries running on Hoodie tables

Migration Guide for this release

This is the first Apache release for Hudi. Prior to this release, Hudi Jars were published using “com.uber.hoodie” maven co-ordinates. We have a migration guide

Raw Release Notes

The raw release notes are available here

Release 0.4.7

Release Highlights

  • Major releases with fundamental changes to filesystem listing & write failure handling
  • Introduced the first version of HoodieTimelineServer that runs embedded on the driver
  • With all executors fetching filesystem listing via RPC to timeline server, drastically reduced filesystem listing!
  • Failing concurrent write tasks are now handled differently to be robust against spark stage retries
  • Bug fixes/clean up around indexing, compaction

PR LIST

  • Skip Meta folder when looking for partitions. #698
  • HUDI-134 - Disable inline compaction for Hoodie Demo. #696
  • Default implementation for HBase index qps allocator. #685
  • Handle duplicate record keys across partitions. #687
  • Fix up offsets not available on leader exception. #650
Back to top ↑