Maintaining well-sized files can improve query performance significantly
Different key generators available with Apache Hudi
Introduce clustering feature to change data layout
How T3Go’s high-performance data lake using Apache Hudi and Alluxio shortened the time for data ingestion into the lake by up to a factor of 2. Data analysts using Presto, Hudi, and Alluxio in conjunction to query data on the lake saw queries speed up by 10 times faster.
Detailing different indexing mechanisms in Hudi and when to use each of them
Apply record level changes from relational databases to Amazon S3 data lake using Apache Hudi on Amazon EMR and AWS Database Migration Service
AWS blog showing how to build a CDC pipeline that captures data from an Amazon RDS for MySQL database using AWS DMS and applies those changes to an Amazon S3 dataset using Apache Hudi on Amazon EMR.
The design and latest progress of the integration of Apache Hudi and Apache Flink.
Solution to set up a new data and analytics platform using Apache Hudi on Amazon EMR and other managed services, including Amazon QuickSight for data visualization.
Ingesting multiple tables using Hudi at a single go is now possible. This blog gives a detailed explanation of how to achieve the same using HoodieMultiTableDeltaStreamer.java
Mechanisms for executing compaction jobs in Hudi asynchronously
Migrating a large parquet table to Apache Hudi without having to rewrite the entire dataset.
How Apache Hudi provides ability for incremental data processing.
Introducing the feature of reporting Hudi metrics via Datadog HTTP API
Integrating HUDI’s real-time and read-optimized query capabilities into Apache Zeppelin’s notebook
Learn how to copy or export HUDI dataset in various formats.
In this blog, we will build an end-end solution for capturing changes from a MySQL instance running on AWS RDS to a Hudi table on S3, using capabilities in the Hudi 0.5.1 release.
Deletes are supported at a record level in Hudi with 0.5.1 release. This blog is a “how to” blog on how to delete records in hudi.
Learn how to ingesting changes from a HUDI dataset using Sqoop/Hudi
How to manually register HUDI dataset into Hive using beeline
In the coming weeks, we will be moving in our new home on the Apache Incubator.
We will be presenting Hudi & general concepts around how incremental processing works at Uber. Catch our talk “Incremental Processing on Hadoop At Uber”