演讲 & Hudi 用户

已使用

Uber

Hudi最初由Uber开发,用于实现低延迟、高效率的数据库摄取。 Hudi自2016年8月开始在生产环境上线,在Hadoop上驱动约100个非常关键的业务表,支撑约几百TB的数据规模(前10名包括行程、乘客、司机)。 Hudi还支持几个增量的Hive ETL管道,并且目前已集成到Uber的数据分发系统中。

EMIS Health

EMIS Health是英国最大的初级保健IT软件提供商,其数据集包括超过5000亿的医疗保健记录。HUDI用于管理生产中的分析数据集,并使其与上游源保持同步。Presto用于查询以HUDI格式写入的数据。

Yields.io

Yields.io是第一个使用AI在企业范围内进行自动模型验证和实时监控的金融科技平台。他们的数据湖由Hudi管理,他们还积极使用Hudi为增量式、跨语言/平台机器学习构建基础架构。

Yotpo

Hudi在Yotpo有不少用途。首先,在他们的开源ETL框架中集成了Hudi作为CDC管道的输出写入程序,即从数据库binlog生成的事件流到Kafka然后再写入S3。

演讲 & 报告

  1. “Hoodie: Incremental processing on Hadoop at Uber” - By Vinoth Chandar & Prasanna Rajaperumal Mar 2017, Strata + Hadoop World, San Jose, CA

  2. “Hoodie: An Open Source Incremental Processing Framework From Uber” - By Vinoth Chandar. Apr 2017, DataEngConf, San Francisco, CA Slides Video

  3. “Incremental Processing on Large Analytical Datasets” - By Prasanna Rajaperumal June 2017, Spark Summit 2017, San Francisco, CA. Slides Video

  4. “Hudi: Unifying storage and serving for batch and near-real-time analytics” - By Nishith Agarwal & Balaji Vardarajan September 2018, Strata Data Conference, New York, NY

  5. “Hudi: Large-Scale, Near Real-Time Pipelines at Uber” - By Vinoth Chandar & Nishith Agarwal October 2018, Spark+AI Summit Europe, London, UK

  6. “Powering Uber’s global network analytics pipelines in real-time with Apache Hudi” - By Ethan Guo & Nishith Agarwal, April 2019, Data Council SF19, San Francisco, CA.

  7. “Building highly efficient data lakes using Apache Hudi (Incubating)” - By Vinoth Chandar June 2019, SF Big Analytics Meetup, San Mateo, CA

  8. “Apache Hudi (Incubating) - The Past, Present and Future Of Efficient Data Lake Architectures” - By Vinoth Chandar & Balaji Varadarajan September 2019, ApacheCon NA 19, Las Vegas, NV, USA

  9. “Insert, upsert, and delete data in Amazon S3 using Amazon EMR” - By Paul Codding & Vinoth Chandar December 2019, AWS re:Invent 2019, Las Vegas, NV, USA

  10. “Building Robust CDC Pipeline With Apache Hudi And Debezium” - By Pratyaksh, Purushotham, Syed and Shaik December 2019, Hadoop Summit Bangalore, India

  11. “Using Apache Hudi to build the next-generation data lake and its application in medical big data” - By JingHuang & Leesf March 2020, Apache Hudi & Apache Kylin Online Meetup, China

  12. “Building a near real-time, high-performance data warehouse based on Apache Hudi and Apache Kylin” - By ShaoFeng Shi March 2020, Apache Hudi & Apache Kylin Online Meetup, China

  13. “Building large scale, transactional data lakes using Apache Hudi” - By Nishith Agarwal, June 2020, Berlin Buzzwords 2020.

  14. “Apache Hudi - Design/Code Walkthrough Session for Contributors” - By Vinoth Chandar, July 2020, Hudi community.

  15. “PrestoDB and Apache Hudi” - By Bhavani Sudha Saktheeswaran and Brandon Scheller, Aug 2020, PrestoDB Community Meetup.

  16. “Panel Discussion on Presto Ecosystem” - By Vinoth Chandar, Sep 2020, PrestoCon “panel”.

文章

  1. “The Case for incremental processing on Hadoop” - O’reilly Ideas article by Vinoth Chandar
  2. “Hoodie: Uber Engineering’s Incremental Processing Framework on Hadoop” - Engineering Blog By Prasanna Rajaperumal
  3. “New – Insert, Update, Delete Data on S3 with Amazon EMR and Apache Hudi” - AWS Blog by Danilo Poccia
  4. “The Apache Software Foundation Announces Apache® Hudi™ as a Top-Level Project” - ASF Graduation announcement
  5. “Apache Hudi grows cloud data lake maturity”
  6. “Building a Large-scale Transactional Data Lake at Uber Using Apache Hudi” - Uber eng blog by Nishith Agarwal
  7. “Hudi On Hops” - By NETSANET GEBRETSADKAN KIDANE
  8. “开源数据湖存储框架 Apache Hudi 如何玩转增量处理” - InfoQ CN article by Yanghua
Back to top ↑