Video Guides, Tutorials & Hands on labs
"Insert | Update | Delete On Datalake (S3) with Apache Hudi and glue Pyspark - By Soumil Shah, Nov 17th 2022
"Build a Spark pipeline to analyze streaming data using AWS Glue, Apache Hudi, S3 and Athena" - By Soumil Shah, Nov 19th 2022
"Different table types in Apache Hudi | MOR and COW | Deep Dive | By Sivabalan Narayanan - By Soumil Shah, Nov 20th 2022
"Simple 5 Steps Guide to get started with Apache Hudi and Glue 4.0 and query the data using Athena" - By Soumil Shah, Dec 8th 2022
"Build Datalakes on S3 with Apache HUDI in a easy way for Beginners with hands on labs | Glue" - By Soumil Shah, Dec 11th 2022
"How to convert Existing data in S3 into Apache Hudi Transaction Datalake with Glue | Hands on Lab" - By Soumil Shah, Dec 14th 2022
"Build Slowly Changing Dimensions Type 2 (SCD2) with Apache Spark and Apache Hudi | Hands on Labs" - By Soumil Shah, Dec 14th 2022
"Hands on Lab with using DynamoDB as lock table for Apache Hudi Data Lakes" - By Soumil Shah, Dec 14th 2022
"Build production Ready Real Time Transaction Hudi Datalake from DynamoDB Streams using Glue &kinesis" - By Soumil Shah, Dec 15th 2022
"Step by Step Guide on Migrate Certain Tables from DB using DMS into Apache Hudi Transaction Datalake" - By Soumil Shah, Dec 17th 2022
"Migrate Certain Tables from ONPREM DB using DMS into Apache Hudi Transaction Datalake with Glue|Demo" - By Soumil Shah, Dec 17th 2022
"Insert|Update|Read|Write|SnapShot| Time Travel |incremental Query on Apache Hudi datalake (S3)" - By Soumil Shah, Dec 18th 2022
"Build Production Ready Alternative Data Pipeline from DynamoDB to Apache Hudi | PROJECT DEMO" - By Soumil Shah, Dec 19th 2022
"Build Production Ready Alternative Data Pipeline from DynamoDB to Apache Hudi | Step by Step Guide" - By Soumil Shah, Dec 19th 2022
"Getting started with Kafka and Glue to Build Real Time Apache Hudi Transaction Datalake" - By Soumil Shah, Dec 20th 2022
"Learn Schema Evolution in Apache Hudi Transaction Datalake with hands on labs" - By Soumil Shah, Dec 21st 2022
"Apache Hudi with DBT Hands on Lab.Transform Raw Hudi tables with DBT and Glue Interactive Session" - By Soumil Shah, Dec 23rd 2022
Apache Hudi on Windows Machine Spark 3.3 and hadoop2.7 Step by Step guide and Installation Process - By Soumil Shah, Dec 24th 2022
Lets Build Streaming Solution using Kafka + PySpark and Apache HUDI Hands on Lab with code - By Soumil Shah, Dec 24th 2022
Bring Data from Source using Debezium with CDC into Kafka&S3Sink &Build Hudi Datalake | Hands on lab - By Soumil Shah, Dec 27th 2022
Comparing Apache Hudi's MOR and COW Tables: Use Cases from Uber - By Soumil Shah, Dec 28th 2022
Step by Step guide how to setup VPC & Subnet & Get Started with HUDI on EMR | Installation Guide | - By Soumil Shah, Dec 30th 2022
Streaming ETL using Apache Flink joining multiple Kinesis streams | Demo - By Soumil Shah, Jan 1st 2023
Transaction Hudi Data Lake with Streaming ETL from Multiple Kinesis Streams & Joining using Flink - By Soumil Shah, Jan 1st 2023
Great Article|Apache Hudi vs Delta Lake vs Apache Iceberg - Lakehouse Feature Comparison by OneHouse - By Soumil Shah, Jan 11th 2023
Build Real Time Streaming Pipeline with Apache Hudi Kinesis and Flink | Hands on Lab - By Soumil Shah, Jan 12th 2023
Build Real Time Low Latency Streaming pipeline from DynamoDB to Apache Hudi using Kinesis,Flink|Lab - By Soumil Shah, Jan 13th 2023
Real Time Streaming Data Pipeline From Aurora Postgres to Hudi with DMS , Kinesis and Flink |DEMO - By Soumil Shah, Jan 15th 2023
Real Time Streaming Pipeline From Aurora Postgres to Hudi with DMS , Kinesis and Flink |Hands on Lab - By Soumil Shah, Jan 16th 2023
Leverage Apache Hudi upsert to remove duplicates on a data lake | Hudi Labs - By Soumil Shah, Jan 17th 2023
Use Apache Hudi for hard deletes on your data lake for data governance | Hudi Labs - By Soumil Shah, Jan 17th 2023
How businesses use Hudi Soft delete features to do soft delete instead of hard delete on Datalake - By Soumil Shah, Jan 17th 2023
Leverage Apache Hudi incremental query to process new & updated data | Hudi Labs - By Soumil Shah, Jan 17th 2023
Global Bloom Index: Remove duplicates & guarantee uniquness | Hudi Labs - By Soumil Shah, Jan 17th 2023
Cleaner Service: Save up to 40% on data lake storage costs | Hudi Labs - By Soumil Shah, Jan 17th 2023
Precomb Key Overview: Avoid dedupes | Hudi Labs - By Soumil Shah, Jan 17th 2023
How do I identify Schema Changes in Hudi Tables and Send Email Alert when New Column added/removed - By Soumil Shah, Jan 20th 2023
How to detect and Mask PII data in Apache Hudi Data Lake | Hands on Lab- By Soumil Shah, Jan 21st 2023
Writing data quality and validation scripts for a Hudi data lake with AWS Glue and pydeequ| Hands on Lab- By Soumil Shah, Jan 23, 2023
Learn How to restrict Intern from accessing Certain Column in Hudi Datalake with lake Formation- By Soumil Shah, Jan 28th 2023
How do I Ingest Extremely Small Files into Hudi Data lake with Glue Incremental data processing- By Soumil Shah, Feb 7th 2023
Create Your Hudi Transaction Datalake on S3 with EMR Serverless for Beginners in fun and easy way- By Soumil Shah, Feb 11th 2023
Streaming Ingestion from MongoDB into Hudi with Glue, kinesis&Event bridge&MongoStream Hands on labs- By Soumil Shah, Feb 18th 2023
Apache Hudi Bulk Insert Sort Modes a summary of two incredible blogs- By Soumil Shah, Feb 21st 2023
Apache Hudi Bulk Insert Sort Modes a summary of two incredible blogs- By Soumil Shah, Feb 21st 2023
Use Glue 4.0 to take regular save points for your Hudi tables for backup or disaster Recovery- By Soumil Shah, Feb 22nd 2023
RFC-51 Change Data Capture in Apache Hudi like Debezium and AWS DMS Hands on Labs- By Soumil Shah, Feb 25th 2023
Python helper class which makes querying incremental data from Hudi Data lakes easy- By Soumil Shah, Feb 26th 2023
Develop Incremental Pipeline with CDC from Hudi to Aurora Postgres | Demo Video- By Soumil Shah, Mar 4th 2023
Power your Down Stream ElasticSearch Stack From Apache Hudi Transaction Datalake with CDC|Demo Video- By Soumil Shah, Mar 6th 2023
Power your Down Stream Elastic Search Stack From Apache Hudi Transaction Datalake with CDC|DeepDive- By Soumil Shah, Mar 6th 2023
How to Rollback to Previous Checkpoint during Disaster in Apache Hudi using Glue 4.0 Demo- By Soumil Shah, Mar 7th 2023
How do I read data from Cross Account S3 Buckets and Build Hudi Datalake in Datateam Account- By Soumil Shah, Mar 11th 2023
Query cross-account Hudi Glue Data Catalogs using Amazon Athena- By Soumil Shah, Mar 11th 2023
Learn About Bucket Index (SIMPLE) In Apache Hudi with lab- By Soumil Shah, Mar 15th 2023
Setting Uber’s Transactional Data Lake in Motion with Incremental ETL Using Apache Hudi- By Soumil Shah, Mar 17th 2023
Push Hudi Commit Notification TO HTTP URI with Callback- By Soumil Shah, Mar 18th 2023
RFC - 18: Insert Overwrite in Apache Hudi with Example- By Soumil Shah, Mar 19th 2023
RFC 42: Consistent Hashing in Apache Hudi MOR Tables- By Soumil Shah, Mar 21st 2023
Data Analysis for Apache Hudi Blogs on Medium with Pandas- By Soumil Shah, Mar 24th 2023
Weekend Project |Build CDC Pipeline from Microsoft SQL Server into Apache Hudi #1- By Soumil Shah, Mar 25th 2023
Build CDC Pipeline from Microsoft SQL Server into Apache Hudi with AWS DMS | PART 1- By Soumil Shah, Mar 25th 2023
Build CDC Pipeline from Microsoft SQL Server into Apache Hudi with AWS DMS | PART 2- By Soumil Shah, Mar 25th 2023
Build CDC Pipeline from Microsoft SQL Server into Apache Hudi with AWS DMS | PART 3- By Soumil Shah, Mar 25th 2023
Build CDC Pipeline from Microsoft SQL Server into Apache Hudi with AWS DMS | PART 4- By Soumil Shah, Mar 25th 2023
Build CDC Pipeline from Microsoft SQL Server into Apache Hudi with AWS DMS | PART 5- By Soumil Shah, Mar 25th 2023
Podcast 1: What is Apache Hudi- By Soumil Shah, Mar 25th 2023
Podcast 2: Why Lake house architecture is being adopted by businesses, and how hudi can assist you- By Soumil Shah, Mar 25th 2023
Podcast 3 : Benefits of Using a Transactional Data Lake and How Apache Hudi Can Help- By Soumil Shah, Mar 25th 2023
Podcast: Exploring Copy on Write Table Type in Apache Hudi: Benefits and Use Cases- By Soumil Shah, Mar 25th 2023
Podcast: Maximizing Efficiency with Merge on Read Table Type in Apache Hudi: Benefits and Use Cases- By Soumil Shah, Mar 25th 2023
Podcast: Optimizing Data Lake Performance with Apache Hudi Compaction: Strategies and Benefits- By Soumil Shah, Mar 25th 2023
Podcast: Maximizing Data Management Efficiency with Apache Hudi's Clustering Feature- By Soumil Shah, Mar 25th 2023
Podcast: Importance of Data Governance and Apache Hudi in Managing a Data Lake: The Case of Grofers- By Soumil Shah, Mar 25th 2023
How to use Apache Hudi with AWS Glue Studio Visual Editor | Hands on Lab- By Soumil Shah, Mar 26th 2023
Podcast Uber's Game-Changing Data Management with Apache Hudi Delta Streamer- By Soumil Shah, Mar 30th 2023
Project: Using Apache Hudi Deltastreamer and AWS DMS Hands on Lab# Part 1- By Soumil Shah, Mar 30th 2023
Project : Using Apache Hudi Deltastreamer and AWS DMS Hands on Lab# Part 2- By Soumil Shah, Mar 30th 2023
Project : Using Apache Hudi Deltastreamer and AWS DMS Hands on Lab# Part 3- By Soumil Shah, Mar 30th 2023
Project : Using Apache Hudi Deltastreamer and AWS DMS Hands on Lab# Part 4- By Soumil Shah, Mar 30th 2023
Project : Using Apache Hudi Deltastreamer and AWS DMS Hands on Lab# Part 5- By Soumil Shah, Mar 31st 2023
Learn How to Integrate Apache Hudi with Redshift Spectrum Hands on Labs with Code- By Soumil Shah, April 2nd 2023
Running Apache Hudi Delta Streamer On EMR Serverless Hands on Lab step by step guide- By Soumil Shah, April 4th 2023
Getting Alerts when hudi Delta Streamer Fails with Event Driven Approach using Lambdas &Event Bridge- By Soumil Shah, April 5th 2023
Efficient Data Lake Management with Apache Hudi Cleaner: Benefits of Scheduling Data Cleaning #1- By Soumil Shah, April 6th 2023
Efficient Data Lake Management with Apache Hudi Cleaner: Benefits of Scheduling Data Cleaning #2- By Soumil Shah, April 6th 2023
Advantages of Metadata Indexing and Asynchronous Indexing in Hudi Hands on Lab- By Soumil Shah, April 7th 2023
Understanding Clustering in Apache Hudi and the Benefits of Asynchronous Clustering- By Soumil Shah, April 8th 2023
Bootstrapping in Apache Hudi on EMR Serverless with Lab- By Soumil Shah, April 9th 2023
Journey to Hudi Transactional Data Lake Mastery: How I Learned and Succeeded- By Soumil Shah, April 11th 2023
Learn about Apache Hudi Transformers with Hands on Lab- By Soumil Shah, April 11th 2023
Podcast: Unleashing the Power of Data: The Epic Story Behind Apache Hudi's Creation by Uber- By Soumil Shah, April 11th 2023
Efficient Data Ingestion with Glue Concurrency and Hudi Data Lake- By Soumil Shah, April 12th 2023
Effortlessly Sync Your JDBC Source to Hudi Transactional Datalake: No DMS or Debezium Required!- By Soumil Shah, April 20th 2023
Joining Hudi Raw Tables for Powerful Data Analysis with Spark SQL- By Soumil Shah, April 25th 2023
From Raw Data to Insights: Building a Lake House with Hudi and Star Schema | Step by Step Guide- By Soumil Shah, April 26th 2023
Efficiently Managing Ride & Late Arriving Tips Data with Incremental ETL using Apache Hudi :Hands On- By Soumil Shah, April 29th 2023
Building a Scalable and Resilient Streaming ETL Pipeline with Hudi's Incremental Processing #1- By Soumil Shah, May 1st 2023
Mastering Slowly Changing Dimension with Hudi: A Step-by-Step Guide to Efficient Data Management|- By Soumil Shah, May 3rd 2023
Build, deploy, and run Spark jobs on Amazon EMR with the open-source EMR CLI tool- By Soumil Shah, May 3rd 2023
How to Build Your Own Version of AWS Glue Bookmark to get Only New Incremental Files- By Soumil Shah, May 6th 2023
Maximizing Efficiency DataLake(Hudi) Glue ETL Jobs with Templated Approach &Serverless Architecture- By Soumil Shah, May 7th 2023
EMR Serverless for Beginners: | Ingest Data incrementally | Submit Spark Job with EMR-CLI |Data lake- By Soumil Shah, May 11th 2023
EMR Serverless Made Easy: Submitting Hive SQL Queries for Beginners with NYC Taxi Dataset- By Soumil Shah, May 13th 2023
Unify Your Event Data:Guide to Mapping Events to Standardized Format with Incremental ETL using Hudi- By Soumil Shah, May 16th 2023
Hands-On Lab: Unleashing Efficiency and Flexibility with Partial Updates in Apache Hudi- By Soumil Shah, May 19th 2023
Mastering File Sizing in Hudi: Boosting Performance and Efficiency- By Soumil Shah, May 20th 2023
How to Set Up AWS Glue Locally with Docker: Accessing Glue Database & Table in Your LocalEnvironment- By Soumil Shah, May 21st 2023
Automate alerting and reporting for AWS Glue job resource usage- By Soumil Shah, May 27th 2023
AWS and Apache Hudi Workshop Overview: Build a ride share lakehouse platform- By Onehouse, May 31st 2023
How to Query Hudi Tables in Incremental Fashion and Get only New data on AWS Glue | Hands on Lab- By Soumil Shah, June 2nd 2023
How to JOIN Hudi Tables in Incremental fashion with DynamoDB in AWS GLue | Hands on Lab for Begineer- By Soumil Shah, June 5th 2023
Learn | How to delete Partition in Apache Hudi on AWS Glue | Hands on- By Soumil Shah, June 7th 2023
How to read data from Multiple Hudi Tables Join them and insert into DynamoDB with AWS Glue- By Soumil Shah, June 10th 2023
SNS + Lambda: How to Trigger Lambda Functions from SNS using Message Filtering- By Soumil Shah, June 16th 2023
Full Workshop Recap: Build a ride-share lakehouse platform- By Nadine Farah and Soumil Shah, June 22nd 2023
Learn About Apache Hudi Pre Commit Validator with Hands on Lab- By Soumil Shah, June 23rd 2023
Building Lakehouse using Hudi | Apache Hudi | Data Lakehouse | Hudi | Apache- By DataCouch, July 1st 2023
Hudi Best Practices: Handling Failed Inserts/Upserts with Error Tables- By Soumil Shah, July 2nd 2023
Hudi Best Practices: Handling Failed Inserts/Upserts with Error Tables- By Soumil Shah, July 2nd 2023
Incremental Data Extraction from Postgres using Triggers and PySpark- By Soumil Shah, July 9th 2023
How Data Scientist &Data Engineer Can Query Hudi Tables with Athena Spark Notebook for AdhocAnalysis- By Soumil Shah, June 7th 2023
Develop Incremental ETL Pipeline From Hudi Tables to Redshift Using AWS Glue and Spark- By Soumil Shah, July 9th 2023
learn How to use AWS Glue Crawler with Hudi Tables to Catlog the Data- By Soumil Shah, July 22nd 2023
Removing Duplicates in Hudi Partitions with Insert_Overwrite API and Spark SQL- By Soumil Shah, July 28th 2023
Building and Automating Hudi Medallion Architecture with AWS Glue Workflow Hands on Labs StepbyStep- By Soumil Shah, August 1st 2023
Powering Event-Driven Workloads with Hudi Read Stream & AWS Glue Streaming JOBS!- By Soumil Shah, August 3rd 2023
Easy Step by Step Guide for Beginner Setup AWS Transfer Family - SFTP with S3- By Soumil Shah, August 6th 2023
Easy Step by Step Guide for Beginner Ingest CSV Files into Hudi with AWS GLue | Hands on Labs- By Soumil Shah, August 9th 2023