Release 0.10.0
Release 0.10.0 (docs)
Migration Guide
- If migrating from an older release, please also check the upgrade instructions for each subsequent release below.
- With 0.10.0, we have made some foundational fix to metadata table and so as part of upgrade, any existing metadata table is cleaned up. Whenever Hudi is launched with newer table version i.e 3 (or moving from an earlier release to 0.10.0), an upgrade step will be executed automatically. This automatic upgrade step will happen just once per Hudi table as the hoodie.table.version will be updated in property file after upgrade is completed.
- Similarly, a command line tool for Downgrading (command - downgrade) is added if in case some users want to downgrade Hudi from table version 3 to 2 or move from Hudi 0.10.0 to pre 0.10.0. This needs to be executed from a 0.10.0 hudi-cli binary/script.
- We have made some major fixes to 0.10.0 release around metadata table and would recommend users to try out metadata for better performance from optimized file listings. As part of the upgrade, please follow the below steps to enable metadata table.
Prerequisites for enabling metadata table
Hudi writes and reads have to perform “list files” operation on the file system to get the current view of the system. This could be very costly in cloud stores which could throttle your requests depending on the scale/size of your dataset. So, we introduced a metadata table in 0.7.0 to cache the file listing for the table. With 0.10.0, we have made a foundational fix to the metadata table with synchronous updates instead of async updates to simplify the overall design and to assist in building future enhancements like multi-modal indexing. This can be turned on using the config hoodie.metadata.enable. By default, metadata table based file listing feature is disabled.
Deployment Model 1 : If your current deployment model is single writer and all table services (cleaning, clustering, compaction) are configured to be inline, then you can turn on the metadata table without needing any additional configuration.
Deployment Model 2 : If your current deployment model is multi writer along with lock providers configured, then you can turn on the metadata table without needing any additional configuration.
Deployment Model 3 : If your current deployment model is single writer along with async table services (such as cleaning, clustering, compaction) configured, then it is a must to have the lock providers configured before turning on the metadata table. Even if you have already had a metadata table turned on, and your deployment model employs async table services, then it is a must to have lock providers configured before upgrading to this release.
Upgrade steps
For deployment mode 1, restarting the Single Writer with 0.10.0 is sufficient to upgrade the table.
For deployment model 2 with multi-writers, you can bring up the writers with 0.10.0 sequentially. If you intend to use the metadata table, it is a must to have the metadata config enabled across all the writers. Otherwise, it will lead to loss of data from the inconsistent writer.
For deployment model 3 with single writer and async table services, restarting the single writer along with async services is sufficient to upgrade the table. If async services are configured to run separately from the writer, then it is a must to have a consistent metadata config across all writers and async jobs. Remember to configure the lock providers as detailed above if enabling the metadata table.
To leverage the metadata table based file listings, readers must have metadata config turned on explicitly while querying. If not, readers will not leverage the file listings from the metadata table.