Skip to main content

Video Guides, Tutorials & Hands on labs

  1. "Insert | Update | Delete On Datalake (S3) with Apache Hudi and glue Pyspark - By Soumil Shah, Nov 17th 2022

  2. "Build a Spark pipeline to analyze streaming data using AWS Glue, Apache Hudi, S3 and Athena" - By Soumil Shah, Nov 19th 2022

  3. "Different table types in Apache Hudi | MOR and COW | Deep Dive | By Sivabalan Narayanan - By Soumil Shah, Nov 20th 2022

  4. "Simple 5 Steps Guide to get started with Apache Hudi and Glue 4.0 and query the data using Athena" - By Soumil Shah, Dec 8th 2022

  5. "Build Datalakes on S3 with Apache HUDI in a easy way for Beginners with hands on labs | Glue" - By Soumil Shah, Dec 11th 2022

  6. "How to convert Existing data in S3 into Apache Hudi Transaction Datalake with Glue | Hands on Lab" - By Soumil Shah, Dec 14th 2022

  7. "Build Slowly Changing Dimensions Type 2 (SCD2) with Apache Spark and Apache Hudi | Hands on Labs" - By Soumil Shah, Dec 14th 2022

  8. "Hands on Lab with using DynamoDB as lock table for Apache Hudi Data Lakes" - By Soumil Shah, Dec 14th 2022

  9. "Build production Ready Real Time Transaction Hudi Datalake from DynamoDB Streams using Glue &kinesis" - By Soumil Shah, Dec 15th 2022

  10. "Step by Step Guide on Migrate Certain Tables from DB using DMS into Apache Hudi Transaction Datalake" - By Soumil Shah, Dec 17th 2022

  11. "Migrate Certain Tables from ONPREM DB using DMS into Apache Hudi Transaction Datalake with Glue|Demo" - By Soumil Shah, Dec 17th 2022

  12. "Insert|Update|Read|Write|SnapShot| Time Travel |incremental Query on Apache Hudi datalake (S3)" - By Soumil Shah, Dec 18th 2022

  13. "Build Production Ready Alternative Data Pipeline from DynamoDB to Apache Hudi | PROJECT DEMO" - By Soumil Shah, Dec 19th 2022

  14. "Build Production Ready Alternative Data Pipeline from DynamoDB to Apache Hudi | Step by Step Guide" - By Soumil Shah, Dec 19th 2022

  15. "Getting started with Kafka and Glue to Build Real Time Apache Hudi Transaction Datalake" - By Soumil Shah, Dec 20th 2022

  16. "Learn Schema Evolution in Apache Hudi Transaction Datalake with hands on labs" - By Soumil Shah, Dec 21st 2022

  17. "Apache Hudi with DBT Hands on Lab.Transform Raw Hudi tables with DBT and Glue Interactive Session" - By Soumil Shah, Dec 23rd 2022

  18. Apache Hudi on Windows Machine Spark 3.3 and hadoop2.7 Step by Step guide and Installation Process - By Soumil Shah, Dec 24th 2022

  19. Lets Build Streaming Solution using Kafka + PySpark and Apache HUDI Hands on Lab with code - By Soumil Shah, Dec 24th 2022

  20. Bring Data from Source using Debezium with CDC into Kafka&S3Sink &Build Hudi Datalake | Hands on lab - By Soumil Shah, Dec 27th 2022

  21. Comparing Apache Hudi's MOR and COW Tables: Use Cases from Uber - By Soumil Shah, Dec 28th 2022

  22. Step by Step guide how to setup VPC & Subnet & Get Started with HUDI on EMR | Installation Guide | - By Soumil Shah, Dec 30th 2022

  23. Streaming ETL using Apache Flink joining multiple Kinesis streams | Demo - By Soumil Shah, Jan 1st 2023

  24. Transaction Hudi Data Lake with Streaming ETL from Multiple Kinesis Streams & Joining using Flink - By Soumil Shah, Jan 1st 2023

  25. Great Article|Apache Hudi vs Delta Lake vs Apache Iceberg - Lakehouse Feature Comparison by OneHouse - By Soumil Shah, Jan 11th 2023

  26. Build Real Time Streaming Pipeline with Apache Hudi Kinesis and Flink | Hands on Lab - By Soumil Shah, Jan 12th 2023

  27. Build Real Time Low Latency Streaming pipeline from DynamoDB to Apache Hudi using Kinesis,Flink|Lab - By Soumil Shah, Jan 13th 2023

  28. Real Time Streaming Data Pipeline From Aurora Postgres to Hudi with DMS , Kinesis and Flink |DEMO - By Soumil Shah, Jan 15th 2023

  29. Real Time Streaming Pipeline From Aurora Postgres to Hudi with DMS , Kinesis and Flink |Hands on Lab - By Soumil Shah, Jan 16th 2023

  30. Leverage Apache Hudi upsert to remove duplicates on a data lake | Hudi Labs - By Soumil Shah, Jan 17th 2023

  31. Use Apache Hudi for hard deletes on your data lake for data governance | Hudi Labs - By Soumil Shah, Jan 17th 2023

  32. How businesses use Hudi Soft delete features to do soft delete instead of hard delete on Datalake - By Soumil Shah, Jan 17th 2023

  33. Leverage Apache Hudi incremental query to process new & updated data | Hudi Labs - By Soumil Shah, Jan 17th 2023

  34. Global Bloom Index: Remove duplicates & guarantee uniquness | Hudi Labs - By Soumil Shah, Jan 17th 2023

  35. Cleaner Service: Save up to 40% on data lake storage costs | Hudi Labs - By Soumil Shah, Jan 17th 2023

  36. Precomb Key Overview: Avoid dedupes | Hudi Labs - By Soumil Shah, Jan 17th 2023

  37. How do I identify Schema Changes in Hudi Tables and Send Email Alert when New Column added/removed - By Soumil Shah, Jan 20th 2023

  38. How to detect and Mask PII data in Apache Hudi Data Lake | Hands on Lab- By Soumil Shah, Jan 21st 2023

  39. Writing data quality and validation scripts for a Hudi data lake with AWS Glue and pydeequ| Hands on Lab- By Soumil Shah, Jan 23, 2023

  40. Learn How to restrict Intern from accessing Certain Column in Hudi Datalake with lake Formation- By Soumil Shah, Jan 28th 2023

  41. How do I Ingest Extremely Small Files into Hudi Data lake with Glue Incremental data processing- By Soumil Shah, Feb 7th 2023

  42. Create Your Hudi Transaction Datalake on S3 with EMR Serverless for Beginners in fun and easy way- By Soumil Shah, Feb 11th 2023

  43. Streaming Ingestion from MongoDB into Hudi with Glue, kinesis&Event bridge&MongoStream Hands on labs- By Soumil Shah, Feb 18th 2023

  44. Apache Hudi Bulk Insert Sort Modes a summary of two incredible blogs- By Soumil Shah, Feb 21st 2023

  45. Apache Hudi Bulk Insert Sort Modes a summary of two incredible blogs- By Soumil Shah, Feb 21st 2023

  46. Use Glue 4.0 to take regular save points for your Hudi tables for backup or disaster Recovery- By Soumil Shah, Feb 22nd 2023

  47. RFC-51 Change Data Capture in Apache Hudi like Debezium and AWS DMS Hands on Labs- By Soumil Shah, Feb 25th 2023

  48. Python helper class which makes querying incremental data from Hudi Data lakes easy- By Soumil Shah, Feb 26th 2023

  49. Develop Incremental Pipeline with CDC from Hudi to Aurora Postgres | Demo Video- By Soumil Shah, Mar 4th 2023

  50. Power your Down Stream ElasticSearch Stack From Apache Hudi Transaction Datalake with CDC|Demo Video- By Soumil Shah, Mar 6th 2023

  51. Power your Down Stream Elastic Search Stack From Apache Hudi Transaction Datalake with CDC|DeepDive- By Soumil Shah, Mar 6th 2023

  52. How to Rollback to Previous Checkpoint during Disaster in Apache Hudi using Glue 4.0 Demo- By Soumil Shah, Mar 7th 2023

  53. How do I read data from Cross Account S3 Buckets and Build Hudi Datalake in Datateam Account- By Soumil Shah, Mar 11th 2023

  54. Query cross-account Hudi Glue Data Catalogs using Amazon Athena- By Soumil Shah, Mar 11th 2023

  55. Learn About Bucket Index (SIMPLE) In Apache Hudi with lab- By Soumil Shah, Mar 15th 2023

  56. Setting Uber’s Transactional Data Lake in Motion with Incremental ETL Using Apache Hudi- By Soumil Shah, Mar 17th 2023

  57. Push Hudi Commit Notification TO HTTP URI with Callback- By Soumil Shah, Mar 18th 2023

  58. RFC - 18: Insert Overwrite in Apache Hudi with Example- By Soumil Shah, Mar 19th 2023

  59. RFC 42: Consistent Hashing in APache Hudi MOR Tables- By Soumil Shah, Mar 21st 2023

  60. Data Analysis for Apache Hudi Blogs on Medium with Pandas- By Soumil Shah, Mar 24th 2023

  61. How to scrape all Blogs about a topic from medium like pro with Python- By Soumil Shah, Mar 24th 2023