Skip to main content

Video Guides, Tutorials & Hands on labs

  1. "Insert | Update | Delete On Datalake (S3) with Apache Hudi and glue Pyspark - By Soumil Shah, Nov 17th 2022

  2. "Build a Spark pipeline to analyze streaming data using AWS Glue, Apache Hudi, S3 and Athena" - By Soumil Shah, Nov 19th 2022

  3. "Different table types in Apache Hudi | MOR and COW | Deep Dive | By Sivabalan Narayanan - By Soumil Shah, Nov 20th 2022

  4. "Simple 5 Steps Guide to get started with Apache Hudi and Glue 4.0 and query the data using Athena" - By Soumil Shah, Dec 8th 2022

  5. "Build Datalakes on S3 with Apache HUDI in a easy way for Beginners with hands on labs | Glue" - By Soumil Shah, Dec 11th 2022

  6. "How to convert Existing data in S3 into Apache Hudi Transaction Datalake with Glue | Hands on Lab" - By Soumil Shah, Dec 14th 2022

  7. "Build Slowly Changing Dimensions Type 2 (SCD2) with Apache Spark and Apache Hudi | Hands on Labs" - By Soumil Shah, Dec 14th 2022

  8. "Hands on Lab with using DynamoDB as lock table for Apache Hudi Data Lakes" - By Soumil Shah, Dec 14th 2022

  9. "Build production Ready Real Time Transaction Hudi Datalake from DynamoDB Streams using Glue &kinesis" - By Soumil Shah, Dec 15th 2022

  10. "Step by Step Guide on Migrate Certain Tables from DB using DMS into Apache Hudi Transaction Datalake" - By Soumil Shah, Dec 17th 2022

  11. "Migrate Certain Tables from ONPREM DB using DMS into Apache Hudi Transaction Datalake with Glue|Demo" - By Soumil Shah, Dec 17th 2022

  12. "Insert|Update|Read|Write|SnapShot| Time Travel |incremental Query on Apache Hudi datalake (S3)" - By Soumil Shah, Dec 18th 2022

  13. "Build Production Ready Alternative Data Pipeline from DynamoDB to Apache Hudi | PROJECT DEMO" - By Soumil Shah, Dec 19th 2022

  14. "Build Production Ready Alternative Data Pipeline from DynamoDB to Apache Hudi | Step by Step Guide" - By Soumil Shah, Dec 19th 2022

  15. "Getting started with Kafka and Glue to Build Real Time Apache Hudi Transaction Datalake" - By Soumil Shah, Dec 20th 2022

  16. "Learn Schema Evolution in Apache Hudi Transaction Datalake with hands on labs" - By Soumil Shah, Dec 21st 2022

  17. "Apache Hudi with DBT Hands on Lab.Transform Raw Hudi tables with DBT and Glue Interactive Session" - By Soumil Shah, Dec 23rd 2022

  18. Apache Hudi on Windows Machine Spark 3.3 and hadoop2.7 Step by Step guide and Installation Process - By Soumil Shah, Dec 24th 2022

  19. Lets Build Streaming Solution using Kafka + PySpark and Apache HUDI Hands on Lab with code - By Soumil Shah, Dec 24th 2022

  20. Bring Data from Source using Debezium with CDC into Kafka&S3Sink &Build Hudi Datalake | Hands on lab - By Soumil Shah, Dec 27th 2022

  21. Comparing Apache Hudi's MOR and COW Tables: Use Cases from Uber - By Soumil Shah, Dec 28th 2022

  22. Step by Step guide how to setup VPC & Subnet & Get Started with HUDI on EMR | Installation Guide | - By Soumil Shah, Dec 30th 2022

  23. Streaming ETL using Apache Flink joining multiple Kinesis streams | Demo - By Soumil Shah, Jan 1st 2023

  24. Transaction Hudi Data Lake with Streaming ETL from Multiple Kinesis Streams & Joining using Flink - By Soumil Shah, Jan 1st 2023

  25. Great Article|Apache Hudi vs Delta Lake vs Apache Iceberg - Lakehouse Feature Comparison by OneHouse - By Soumil Shah, Jan 11th 2023

  26. Build Real Time Streaming Pipeline with Apache Hudi Kinesis and Flink | Hands on Lab - By Soumil Shah, Jan 12th 2023

  27. Build Real Time Low Latency Streaming pipeline from DynamoDB to Apache Hudi using Kinesis,Flink|Lab - By Soumil Shah, Jan 13th 2023

  28. Real Time Streaming Data Pipeline From Aurora Postgres to Hudi with DMS , Kinesis and Flink |DEMO - By Soumil Shah, Jan 15th 2023

  29. Real Time Streaming Pipeline From Aurora Postgres to Hudi with DMS , Kinesis and Flink |Hands on Lab - By Soumil Shah, Jan 16th 2023

  30. Leverage Apache Hudi upsert to remove duplicates on a data lake | Hudi Labs - By Soumil Shah, Jan 17th 2023

  31. Use Apache Hudi for hard deletes on your data lake for data governance | Hudi Labs - By Soumil Shah, Jan 17th 2023

  32. How businesses use Hudi Soft delete features to do soft delete instead of hard delete on Datalake - By Soumil Shah, Jan 17th 2023

  33. Leverage Apache Hudi incremental query to process new & updated data | Hudi Labs - By Soumil Shah, Jan 17th 2023

  34. Global Bloom Index: Remove duplicates & guarantee uniquness | Hudi Labs - By Soumil Shah, Jan 17th 2023

  35. Cleaner Service: Save up to 40% on data lake storage costs | Hudi Labs - By Soumil Shah, Jan 17th 2023

  36. Precomb Key Overview: Avoid dedupes | Hudi Labs - By Soumil Shah, Jan 17th 2023

  37. How do I identify Schema Changes in Hudi Tables and Send Email Alert when New Column added/removed - By Soumil Shah, Jan 20th 2023

  38. How to detect and Mask PII data in Apache Hudi Data Lake | Hands on Lab- By Soumil Shah, Jan 21st 2023

  39. Writing data quality and validation scripts for a Hudi data lake with AWS Glue and pydeequ| Hands on Lab- By Soumil Shah, Jan 23, 2023

  40. Learn How to restrict Intern from accessing Certain Column in Hudi Datalake with lake Formation- By Soumil Shah, Jan 28th 2023

  41. How do I Ingest Extremely Small Files into Hudi Data lake with Glue Incremental data processing- By Soumil Shah, Feb 7th 2023

  42. Create Your Hudi Transaction Datalake on S3 with EMR Serverless for Beginners in fun and easy way- By Soumil Shah, Feb 11th 2023

  43. Streaming Ingestion from MongoDB into Hudi with Glue, kinesis&Event bridge&MongoStream Hands on labs- By Soumil Shah, Feb 18th 2023

  44. Apache Hudi Bulk Insert Sort Modes a summary of two incredible blogs- By Soumil Shah, Feb 21st 2023

  45. Apache Hudi Bulk Insert Sort Modes a summary of two incredible blogs- By Soumil Shah, Feb 21st 2023

  46. Use Glue 4.0 to take regular save points for your Hudi tables for backup or disaster Recovery- By Soumil Shah, Feb 22nd 2023

  47. RFC-51 Change Data Capture in Apache Hudi like Debezium and AWS DMS Hands on Labs- By Soumil Shah, Feb 25th 2023

  48. Python helper class which makes querying incremental data from Hudi Data lakes easy- By Soumil Shah, Feb 26th 2023

  49. Develop Incremental Pipeline with CDC from Hudi to Aurora Postgres | Demo Video- By Soumil Shah, Mar 4th 2023

  50. Power your Down Stream ElasticSearch Stack From Apache Hudi Transaction Datalake with CDC|Demo Video- By Soumil Shah, Mar 6th 2023

  51. Power your Down Stream Elastic Search Stack From Apache Hudi Transaction Datalake with CDC|DeepDive- By Soumil Shah, Mar 6th 2023

  52. How to Rollback to Previous Checkpoint during Disaster in Apache Hudi using Glue 4.0 Demo- By Soumil Shah, Mar 7th 2023

  53. How do I read data from Cross Account S3 Buckets and Build Hudi Datalake in Datateam Account- By Soumil Shah, Mar 11th 2023

  54. Query cross-account Hudi Glue Data Catalogs using Amazon Athena- By Soumil Shah, Mar 11th 2023

  55. Learn About Bucket Index (SIMPLE) In Apache Hudi with lab- By Soumil Shah, Mar 15th 2023

  56. Setting Uber’s Transactional Data Lake in Motion with Incremental ETL Using Apache Hudi- By Soumil Shah, Mar 17th 2023

  57. Push Hudi Commit Notification TO HTTP URI with Callback- By Soumil Shah, Mar 18th 2023

  58. RFC - 18: Insert Overwrite in Apache Hudi with Example- By Soumil Shah, Mar 19th 2023

  59. RFC 42: Consistent Hashing in Apache Hudi MOR Tables- By Soumil Shah, Mar 21st 2023

  60. Data Analysis for Apache Hudi Blogs on Medium with Pandas- By Soumil Shah, Mar 24th 2023

  61. Weekend Project |Build CDC Pipeline from Microsoft SQL Server into Apache Hudi #1- By Soumil Shah, Mar 25th 2023

  62. Build CDC Pipeline from Microsoft SQL Server into Apache Hudi with AWS DMS | PART 1- By Soumil Shah, Mar 25th 2023

  63. Build CDC Pipeline from Microsoft SQL Server into Apache Hudi with AWS DMS | PART 2- By Soumil Shah, Mar 25th 2023

  64. Build CDC Pipeline from Microsoft SQL Server into Apache Hudi with AWS DMS | PART 3- By Soumil Shah, Mar 25th 2023

  65. Build CDC Pipeline from Microsoft SQL Server into Apache Hudi with AWS DMS | PART 4- By Soumil Shah, Mar 25th 2023

  66. Build CDC Pipeline from Microsoft SQL Server into Apache Hudi with AWS DMS | PART 5- By Soumil Shah, Mar 25th 2023

  67. Podcast 1: What is Apache Hudi- By Soumil Shah, Mar 25th 2023

  68. Podcast 2: Why Lake house architecture is being adopted by businesses, and how hudi can assist you- By Soumil Shah, Mar 25th 2023

  69. Podcast 3 : Benefits of Using a Transactional Data Lake and How Apache Hudi Can Help- By Soumil Shah, Mar 25th 2023

  70. Podcast: Exploring Copy on Write Table Type in Apache Hudi: Benefits and Use Cases- By Soumil Shah, Mar 25th 2023

  71. Podcast: Maximizing Efficiency with Merge on Read Table Type in Apache Hudi: Benefits and Use Cases- By Soumil Shah, Mar 25th 2023

  72. Podcast: Optimizing Data Lake Performance with Apache Hudi Compaction: Strategies and Benefits- By Soumil Shah, Mar 25th 2023

  73. Podcast: Maximizing Data Management Efficiency with Apache Hudi's Clustering Feature- By Soumil Shah, Mar 25th 2023

  74. Podcast: Importance of Data Governance and Apache Hudi in Managing a Data Lake: The Case of Grofers- By Soumil Shah, Mar 25th 2023

  75. How to use Apache Hudi with AWS Glue Studio Visual Editor | Hands on Lab- By Soumil Shah, Mar 26th 2023

  76. Podcast Uber's Game-Changing Data Management with Apache Hudi Delta Streamer- By Soumil Shah, Mar 30th 2023

  77. Project: Using Apache Hudi Deltastreamer and AWS DMS Hands on Lab# Part 1- By Soumil Shah, Mar 30th 2023

  78. Project : Using Apache Hudi Deltastreamer and AWS DMS Hands on Lab# Part 2- By Soumil Shah, Mar 30th 2023

  79. Project : Using Apache Hudi Deltastreamer and AWS DMS Hands on Lab# Part 3- By Soumil Shah, Mar 30th 2023

  80. Project : Using Apache Hudi Deltastreamer and AWS DMS Hands on Lab# Part 4- By Soumil Shah, Mar 30th 2023

  81. Project : Using Apache Hudi Deltastreamer and AWS DMS Hands on Lab# Part 5- By Soumil Shah, Mar 31st 2023

  82. Learn How to Integrate Apache Hudi with Redshift Spectrum Hands on Labs with Code- By Soumil Shah, April 2nd 2023

  83. Running Apache Hudi Delta Streamer On EMR Serverless Hands on Lab step by step guide- By Soumil Shah, April 4th 2023

  84. Getting Alerts when hudi Delta Streamer Fails with Event Driven Approach using Lambdas &Event Bridge- By Soumil Shah, April 5th 2023

  85. Efficient Data Lake Management with Apache Hudi Cleaner: Benefits of Scheduling Data Cleaning #1- By Soumil Shah, April 6th 2023

  86. Efficient Data Lake Management with Apache Hudi Cleaner: Benefits of Scheduling Data Cleaning #2- By Soumil Shah, April 6th 2023

  87. Advantages of Metadata Indexing and Asynchronous Indexing in Hudi Hands on Lab- By Soumil Shah, April 7th 2023

  88. Understanding Clustering in Apache Hudi and the Benefits of Asynchronous Clustering- By Soumil Shah, April 8th 2023

  89. Bootstrapping in Apache Hudi on EMR Serverless with Lab- By Soumil Shah, April 9th 2023

  90. Journey to Hudi Transactional Data Lake Mastery: How I Learned and Succeeded- By Soumil Shah, April 11th 2023

  91. Learn about Apache Hudi Transformers with Hands on Lab- By Soumil Shah, April 11th 2023

  92. Podcast: Unleashing the Power of Data: The Epic Story Behind Apache Hudi's Creation by Uber- By Soumil Shah, April 11th 2023

  93. Efficient Data Ingestion with Glue Concurrency and Hudi Data Lake- By Soumil Shah, April 12th 2023

  94. Effortlessly Sync Your JDBC Source to Hudi Transactional Datalake: No DMS or Debezium Required!- By Soumil Shah, April 20th 2023

  95. Joining Hudi Raw Tables for Powerful Data Analysis with Spark SQL- By Soumil Shah, April 25th 2023

  96. From Raw Data to Insights: Building a Lake House with Hudi and Star Schema | Step by Step Guide- By Soumil Shah, April 26th 2023

  97. Efficiently Managing Ride & Late Arriving Tips Data with Incremental ETL using Apache Hudi :Hands On- By Soumil Shah, April 29th 2023

  98. Building a Scalable and Resilient Streaming ETL Pipeline with Hudi's Incremental Processing #1- By Soumil Shah, May 1st 2023

  99. Mastering Slowly Changing Dimension with Hudi: A Step-by-Step Guide to Efficient Data Management|- By Soumil Shah, May 3rd 2023

  100. Build, deploy, and run Spark jobs on Amazon EMR with the open-source EMR CLI tool- By Soumil Shah, May 3rd 2023

  101. How to Build Your Own Version of AWS Glue Bookmark to get Only New Incremental Files- By Soumil Shah, May 6th 2023

  102. Maximizing Efficiency DataLake(Hudi) Glue ETL Jobs with Templated Approach &Serverless Architecture- By Soumil Shah, May 7th 2023

  103. EMR Serverless for Beginners: | Ingest Data incrementally | Submit Spark Job with EMR-CLI |Data lake- By Soumil Shah, May 11th 2023

  104. EMR Serverless Made Easy: Submitting Hive SQL Queries for Beginners with NYC Taxi Dataset- By Soumil Shah, May 13th 2023

  105. Unify Your Event Data:Guide to Mapping Events to Standardized Format with Incremental ETL using Hudi- By Soumil Shah, May 16th 2023

  106. Hands-On Lab: Unleashing Efficiency and Flexibility with Partial Updates in Apache Hudi- By Soumil Shah, May 19th 2023

  107. Mastering File Sizing in Hudi: Boosting Performance and Efficiency- By Soumil Shah, May 20th 2023

  108. How to Set Up AWS Glue Locally with Docker: Accessing Glue Database & Table in Your LocalEnvironment- By Soumil Shah, May 21st 2023

  109. Automate alerting and reporting for AWS Glue job resource usage- By Soumil Shah, May 27th 2023

  110. AWS and Apache Hudi Workshop Overview: Build a ride share lakehouse platform- By Onehouse, May 31st 2023

  111. How to Query Hudi Tables in Incremental Fashion and Get only New data on AWS Glue | Hands on Lab- By Soumil Shah, June 2nd 2023

  112. How to JOIN Hudi Tables in Incremental fashion with DynamoDB in AWS GLue | Hands on Lab for Begineer- By Soumil Shah, June 5th 2023

  113. Learn | How to delete Partition in Apache Hudi on AWS Glue | Hands on- By Soumil Shah, June 7th 2023

  114. How to read data from Multiple Hudi Tables Join them and insert into DynamoDB with AWS Glue- By Soumil Shah, June 10th 2023

  115. SNS + Lambda: How to Trigger Lambda Functions from SNS using Message Filtering- By Soumil Shah, June 16th 2023

  116. Full Workshop Recap: Build a ride-share lakehouse platform- By Nadine Farah and Soumil Shah, June 22nd 2023

  117. Learn About Apache Hudi Pre Commit Validator with Hands on Lab- By Soumil Shah, June 23rd 2023

  118. Building Lakehouse using Hudi | Apache Hudi | Data Lakehouse | Hudi | Apache- By DataCouch, July 1st 2023

  119. Hudi Best Practices: Handling Failed Inserts/Upserts with Error Tables- By Soumil Shah, July 2nd 2023

  120. Hudi Best Practices: Handling Failed Inserts/Upserts with Error Tables- By Soumil Shah, July 2nd 2023

  121. Incremental Data Extraction from Postgres using Triggers and PySpark- By Soumil Shah, July 9th 2023

  122. How Data Scientist &Data Engineer Can Query Hudi Tables with Athena Spark Notebook for AdhocAnalysis- By Soumil Shah, June 7th 2023

  123. Develop Incremental ETL Pipeline From Hudi Tables to Redshift Using AWS Glue and Spark- By Soumil Shah, July 9th 2023

  124. learn How to use AWS Glue Crawler with Hudi Tables to Catlog the Data- By Soumil Shah, July 22nd 2023

  125. Removing Duplicates in Hudi Partitions with Insert_Overwrite API and Spark SQL- By Soumil Shah, July 28th 2023

  126. Building and Automating Hudi Medallion Architecture with AWS Glue Workflow Hands on Labs StepbyStep- By Soumil Shah, August 1st 2023

  127. Powering Event-Driven Workloads with Hudi Read Stream & AWS Glue Streaming JOBS!- By Soumil Shah, August 3rd 2023

  128. Easy Step by Step Guide for Beginner Setup AWS Transfer Family - SFTP with S3- By Soumil Shah, August 6th 2023

  129. Easy Step by Step Guide for Beginner Ingest CSV Files into Hudi with AWS GLue | Hands on Labs- By Soumil Shah, August 9th 2023