Release 1.0.0-beta2
Release 1.0.0-beta2 (docs)
Apache Hudi 1.0.0-beta2 is the second beta release of Apache Hudi. This release is meant for early adopters to try out the new features and provide feedback. The release is not meant for production use.
Migration Guide
This release contains major format changes as we will see in highlights below. We encourage users to try out the 1.0.0-beta2 features on new tables. The 1.0 general availability (GA) release will support automatic table upgrades from 0.x versions, while also ensuring full backward compatibility when reading 0.x Hudi tables using 1.0, ensuring a seamless migration experience.
Given that timeline format and log file format has changed in this beta release, it is recommended not to attempt to do rolling upgrades from older versions to this release.
Highlights
Format changes
HUDI-6242 is the main epic covering all the format changes proposals, which are also partly covered in the Hudi 1.0 tech specification. The following are the main changes in this release:
Timeline
No major changes in this release. Refer to 1.0.0-beta1#timeline for more details.
Log File Format
In addition to the fields in the log file header added in 1.0.0-beta1, we also
store a flag, IS_PARTIAL
to indicate whether the log block contains partial updates or not.
Metadata indexes
In 1.0.0-beta1, we added support for functional index. In 1.0.0-beta2, we have added support for secondary indexes and partition stats index to the multi-modal indexing subsystem.
Secondary Index
Secondary indexes allow users to create indexes on columns that are not part of record key columns in Hudi tables (for record key fields, Hudi supports Record-level Index. Secondary indexes can be used to speed up queries with predicate on columns other than record key columns.
Partition Stats Index
Partition stats index aggregates statistics at the partition level for the columns for which it is enabled. This helps in efficient partition pruning even for non-partition fields.
To try out these features, refer to the SQL guide.
API Changes
Positional Merging with Filegroup Reader
In 1.0.0-beta1, we added a new filegroup reader, which provides 5.7x performance benefits for snapshot queries on Merge-on-Read tables with updates. The reader now provides position-based merging, as an alternative to existing key-based merging, and skipping pages based on record positions. The new filegroup reader is integrated with Spark and Hive, and enabled by default. To enable positional merging set below configs:
hoodie.merge.use.record.positions=true
Hudi-Flink Enhancements
This release comes with the support for lookup joins. A lookup join is typically used to enrich a table with data that is queried from an external system. The join requires one table to have a processing time attribute and the other table to be backed by a lookup source connector. Head over to the FLink SQL guide to try out this feature.
Raw Release Notes
The raw release notes are available here.