Who's Using
Apache Hudi is a fast growing diverse community of people and organizations from all around the globe. The following is a small sample of companies that have adopted or contributed to the Apache Hudi community! Join us on slack, or come to one of our virtual community events.
37 Interactive Entertainment
37 Interactive Entertainment is a global Top20 listed game company, and a leading company on A-shares market of China. Apache Hudi is integrated into our Data Middle Platform offering real-time data warehouse and solving the problem of frequent changes of data. Meanwhile, we build a set of data access standards based on Hudi, which provides a guarantee for massive data queries in game operation scenarios.
Alibaba Cloud
Alibaba Cloud provides cloud computing services to online businesses and Alibaba's own e-commerce ecosystem, Apache Hudi is integrated into Alibaba Cloud Data Lake Analytics offering real-time analysis on hudi dataset.
Amazon
Amazon Transportation service uses Apache Hudi for the backbone of their package delivery system, powering petabyte-scale near real time analytics.
Amazon Web Services
Amazon Web Services is the World's leading cloud services provider. Apache Hudi is pre-installed with the AWS Elastic Map Reduce offering, providing means for AWS users to perform record-level updates/deletes and manage storage efficiently.
Applied Intuition
Applied Intuition accelerates the world’s adoption of safe and intelligent machines. Using Hudi, Applied supports querying across millions of simulations and hours of drive data for their customers. Its table features drastically improve querying performance while maintaining near real-time freshness.
ByteDance
ByteDance uses Apache Hudi to power their Exabyte scale TikTok #ForYouPage realtime recommendation engine.
Clinbrain
Clinbrain is the leader of big data platform and usage in medical industry. We have built 200 medical big data centers by integrating Hudi Data Lake solution in numerous hospitals. Hudi provides the ability to upsert and delete on hdfs, at the same time, it can make the fresh data-stream up-to-date efficiently in hadoop system with the hudi incremental view.
DiDi
DiDi is the World‘s Leading Transportation Platform. Based on the Hadoop ecosystem, we built a new generation of big data platform based on Apache Hudi, which provides record-level updates/deletes as well as streaming and batch integrated data processing.
Disney+ Hotstar
Disney shared how they migrated CDC data to Apache Hudi to power a real-time ads platform for their streaming service.
EMIS Health
EMIS Health is the largest provider of Primary Care IT software in the UK with datasets including more than 500Bn healthcare records. HUDI is used to manage their analytics dataset in production and keeping them up-to-date with their upstream source. Presto is being used to query the data written in HUDI format.
Forethought AI
Forethought AI is the Leading Generative AI for Customer Support Automation. It unlocks efficiency with the generative AI automation platform that lowers support costs while providing top-tier service in every customer interaction.
Funding Circle
Funding Circle helps people invest in equipment or inventory, consolidate debt, grow their business, or pay business expenses by providing loans.
GE Aviation
GE Aviation built cloud-native data pipelines at enterprise scale using Apache Hudi in AWS platform
Grofers
Grofers is a grocery delivery provider operating across APAC region. Grofers has integrated hudi in its central pipelines for replicating backend database CDC into the warehouse.
H3C Digital Platform
H3C digital platform provides the whole process capability of data collection, storage, calculation and governance, and enables the construction of data center and data governance ability for medical, smart park, smart city and other industries; Apache Hudi is integrated in the digital platform to meet the real-time update needs of massive data
Halodoc
Lake House Architecture at Halodoc: Data Platform 2.0
JD
JD is the largest retailer in China, a member of the NASDAQ100 and a Fortune Global 500 company. Apache Hudi is integrated in JD Group‘s Data Platform which provides record-level mutations and incremental query abilities. Currently, Hudi storage scale in JD reached over 125 petabyte(PB) while applied in significant scenarios, resulting in substantial latency improvements and cost reductions. Hudi will play a crutial role in JD’s lakehouse construction.
Jobtarget
Jobtarget is a company dedicated to helping job seekers and employers connect. This focus has allowed us to earn the opportunity to serve thousands of companies and millions of job seekers each month.
Kyligence
Kyligence is the leading Big Data analytics platform company. We’ve built end to end solutions for various Global Fortune 500 companies in US and China. We adopted Apache Hudi in our Cloud solution on AWS in 2019. With the help of Hudi, we are able to process upserts and deletes easily and we use incremental views to build efficient data pipelines in AWS. The Hudi datasets can also be integrated to Kyligence Cloud directly for high concurrent OLAP access.
Kuaishou
Kuaishou is a prominent Chinese giant in the content community and social platform. The company offers a diverse range of services, including live streaming, short videos, and e-commerce. At Kuaishou, both AI experts and BI technologists leverage Hudi to handle their online business cases. In AI scenarios, Hudi is used to build a unified batch-stream sample lake at the exabyte (EB) scale and end-to-end latency of 30 seconds. The sample lake supports both online and offline training. In BI scenarios, Hudi enhances the traditional HIVE data warehouse by enabling streaming ingestion into the lakehouse at the ODS layer. It efficiently processes append and upsert streams, while also supporting incremental updates across the DWD, DWS, and ADS layers. Currently, Hudi plays a key role in both BI and AI scenarios at Kuaishou.
Lingyue-digital Corporation
Lingyue-digital Corporation belongs to BMW Group. Apache Hudi is used to perform ingest MySQL and PostgreSQL change data capture. We build up upsert scenarios on Hadoop and spark.
Logical Clocks
Hopsworks 1.x series supports Apache Hudi feature groups, to enable upserts and time travel.
Navi
Navi, one of India’s fastest-growing financial service providers, offers Personal & Home Loans, UPI, Insurance, Mutual Funds, and Gold. Powered by Apache Hudi, our Data Platform tech stack has been enhanced to enable near-real-time data ingestion, driving AI/ML initiatives and critical business decisions to create exceptional customer experiences.
NerdWallet
NerdWallet uses AWS and Apache Hudi to build a serverless, real-time analytics platform
Notion
Notion is the connected workspace where modern teams create and share docs, take notes, manage their projects and time, and organize knowledge — with AI integrated throughout. Notion uses Apache Hudi to power their core analytics datasets and AI features.
Penn Interactive
The interactive arm of Penn Entertainment, encompassing Penn Interactive and theScore, is a major player in online and retail sports betting markets in the U.S. and Canada, as well as the sports media sector.Apache Hudi's efficient processing helps them to keep Spark resources reasonable while running hundreds of streaming jobs 24/7.
Robinhood
Rds data lake at Robinhood using Apache Hudi
SF-Express
SF-Express is the leading logistics service provider in China. HUDI is used to build a real-time data warehouse, providing real-time computing solutions with higher efficiency and lower cost for our business.
Tathastu.ai
Tathastu.ai offers the largest AI/ML playground of consumer data for data scientists, AI experts and technologists to build upon. They have built a CDC pipeline using Apache Hudi and Debezium. Data from Hudi datasets is being queried using Hive, Presto and Spark.
Tencent
EMR from Tencent Cloud has integrated Hudi as one of its BigData components since V2.2.0. Using Hudi, the end-users can handle either read-heavy or write-heavy use cases, and Hudi will manage the underlying data stored on HDFS/COS/CHDFS using Apache Parquet and Apache Avro.
Uber
Apache Hudi was originally developed at Uber, to achieve low latency database ingestion, with high efficiency. It has been in production since Aug 2016, powering the massive 100PB data lake, including highly business critical tables like core trips,riders,partners. It also powers several incremental Hive ETL pipelines and being currently integrated into Uber's data dispersal system.
Udemy
At Udemy, Apache Hudi on AWS EMR is used to perform ingest MySQL change data capture.
Walmart
Walmart chose Apache Hudi to manage their data lake of store transactions.
Yields.io
Yields.io is the first FinTech platform that uses AI for automated model validation and real-time monitoring on an enterprise-wide scale. Their data lake is managed by Hudi. They are also actively building their infrastructure for incremental, cross language/platform machine learning using Hudi.
Yotpo
Using Hudi at Yotpo for several usages. Firstly, integrated Hudi as a writer in their open source ETL framework, Metorikku and using as an output writer for a CDC pipeline, with events that are being generated from a database binlog streams to Kafka and then are written to S3.
Zendesk
At Zendesk, Apache Hudi is adopted for building Data Lake on AWS.
ZTO Express
ZTO Express is a large group company integrating express delivery, logistics and other businesses. ZTO uses Apache Hudi to build a technical architecture integrating lakes and warehouses. As the core of this architecture, Hudi has helped ZTO realize the quasi-real-time update and analysis capabilities of massive data.