Indexes
In databases, indexes are auxiliary data structures maintained to quickly locate records needed, without reading unnecessary data from storage. Given that Hudi’s design has been heavily optimized for handling mutable change streams, with different write patterns, Hudi considers indexing as an integral part of its design and has uniquely supported indexing capabilities from its inception, to speed up writes on the data lakehouse, while still providing columnar query performance.
Mapping keys to file groups
The most foundational index mechanism in Hudi tracks a mapping from a given key (record key + optionally partition path) consistently to a file id. Other types of indexes like secondary indexes, build on this foundation. This mapping between record key and file group/file id rarely changes once the first version of a record has been written to a file group. Only clustering or cross-partition updates that are implemented as deletes + inserts remap the record key to a different file group. Even then, a given record key is associated with exactly one file group at any completed instant on the timeline.
Need for indexing
For Copy-On-Write tables, indexing enables fast upsert/delete operations, by avoiding the need to join against the entire dataset to determine which files to rewrite. For Merge-On-Read tables, indexing allows Hudi to bound the amount of change records any given base file needs to be merged against. Specifically, a given base file needs to merged only against updates for records that are part of that base file.
Figure: Comparison of merge cost for updates (dark blue blocks) against base files (light blue blocks)
In contrast,
- Designs without an indexing component (e.g: Apache Hive/Apache Iceberg) end up having to merge all the base files against all incoming updates/delete records (10-100x more read amplification).
- Designs that implement heavily write-optimized OLTP data structures like LSM trees do not require an indexing component. But they perform poorly scan heavy workloads against cloud storage making them unsuitable for serving analytical queries.
Hudi shines by achieving both great write performance and read performance, at the extra storage costs of an index, which can however unlock a lot more, as we explore below.
Multi-modal Indexing
Multi-modal indexing, introduced in 0.11.0 Hudi release, is a re-imagination of what a general purpose indexing subsystem should look like for the lake. Multi-modal indexing is implemented by enhancing the metadata table with the flexibility to extend to new index types as new partitions, along with an asynchronous index building
Hudi supports a multi-modal index by augmenting the metadata table with the capability to incorporate new types of indexes, complemented by an asynchronous mechanism for index construction. This enhancement supports a range of indexes within the metadata table, significantly improving the efficiency of both writing to and reading from the table.
Figure: Indexes in Hudi
Bloom Filters
Bloom filter indexes as bloom_filter partition in the metadata table. This index employs range-based pruning on the minimum and maximum values of the record keys and bloom-filter-based lookups to tag incoming records. For large tables, this involves reading the footers of all matching data files for bloom filters, which can be expensive in the case of random updates across the entire dataset. This index stores bloom filters of all data files centrally to avoid scanning the footers directly from all data files.
Following are configurations that control enabling and configuring bloom filters.
| Config Name | Default | Description |
|---|---|---|
| hoodie.metadata.index.bloom.filter.enable | false | Enable indexing bloom filters of user data files under metadata table. When enabled, metadata table will have a partition to store the bloom filter index and will be used during the index lookups.Config Param: ENABLE_METADATA_INDEX_BLOOM_FILTERSince Version: 0.11.0 |
| hoodie.bloom.index.prune.by.ranges | true | Only applies if index type is BLOOM. When true, range information from files to leveraged speed up index lookups. Particularly helpful, if the key has a monotonously increasing prefix, such as timestamp. If the record key is completely random, it is better to turn this off, since range pruning will only add extra overhead to the index lookup.Config Param: BLOOM_INDEX_PRUNE_BY_RANGES |
| hoodie.bloom.index.update.partition.path | true | Only applies if index type is GLOBAL_BLOOM. When set to true, an update including the partition path of a record that already exists will result in inserting the incoming record into the new partition and deleting the original record in the old partition. When set to false, the original record will only be updated in the old partitionConfig Param: BLOOM_INDEX_UPDATE_PARTITION_PATH_ENABLE |
| hoodie.bloom.index.use.metadata | false | Only applies if index type is BLOOM.When true, the index lookup uses bloom filters and column stats from metadata table when available to speed up the process.Config Param: BLOOM_INDEX_USE_METADATASince Version: 0.11.0 |
| hoodie.metadata.index.bloom.filter.column.list | (N/A) | Comma-separated list of columns for which bloom filter index will be built. If not set, only record key will be indexed.Config Param: BLOOM_FILTER_INDEX_FOR_COLUMNSSince Version: 0.11.0 |
| hoodie.metadata.index.bloom.filter.file.group.count | 4 | Metadata bloom filter index partition file group count. This controls the size of the base and log files and read parallelism in the bloom filter index partition. The recommendation is to size the file group count such that the base files are under 1GB.Config Param: METADATA_INDEX_BLOOM_FILTER_FILE_GROUP_COUNTSince Version: 0.11.0 |
Record Indexes
Record indexes as record_index partition in the metadata table. Contains the mapping of the record key to location. Record index is a global index, enforcing key uniqueness across all partitions in the table. This index aids in locating records faster than other existing indexes and can provide a speedup orders of magnitude faster in large deployments where index lookup dominates write latencies. To accommodate very high scales, it utilizes hash-based sharding of the key space. Additionally, when it comes to reading data, the index allows for point lookups significantly speeding up index mapping retrieval process.
Following are configurations that control enabling record index building and maintenance on the writer.
| Config Name | Default | Description |
|---|---|---|
| hoodie.metadata.record.index.enable | false | Create the HUDI Record Index within the Metadata TableConfig Param: RECORD_INDEX_ENABLE_PROPSince Version: 0.14.0 |
| hoodie.metadata.record.index.growth.factor | 2.0 | The current number of records are multiplied by this number when estimating the number of file groups to create automatically. This helps account for growth in the number of records in the dataset.Config Param: RECORD_INDEX_GROWTH_FACTOR_PROPSince Version: 0.14.0 |
| hoodie.metadata.record.index.max.filegroup.count | 10000 | Maximum number of file groups to use for Record Index.Config Param: RECORD_INDEX_MAX_FILE_GROUP_COUNT_PROPSince Version: 0.14.0 |
| hoodie.metadata.record.index.max.filegroup.size | 1073741824 | Maximum size in bytes of a single file group. Large file group takes longer to compact.Config Param: RECORD_INDEX_MAX_FILE_GROUP_SIZE_BYTES_PROPSince Version: 0.14.0 |
| hoodie.metadata.record.index.min.filegroup.count | 10 | Minimum number of file groups to use for Record Index.Config Param: RECORD_INDEX_MIN_FILE_GROUP_COUNT_PROPSince Version: 0.14.0 |
| hoodie.record.index.update.partition.path | false | Similar to Key: 'hoodie.bloom.index.update.partition.path' , default: true , isAdvanced: true , description: Only applies if index type is GLOBAL_BLOOM. When set to true, an update including the partition path of a record that already exists will result in inserting the incoming record into the new partition and deleting the original record in the old partition. When set to false, the original record will only be updated in the old partition since version: version is not defined deprecated after: version is not defined, but for record index.Config Param: RECORD_INDEX_UPDATE_PARTITION_PATH_ENABLESince Version: 0.14.0 |