Timeline
At its core, Hudi maintains a timeline
which is a log of all actions performed on the table at different instants
of time that helps provide instantaneous views of the table,
while also efficiently supporting retrieval of data in the order of arrival. A Hudi instant consists of the following components
Instant action
: Type of action performed on the tableInstant time
: Instant time is typically a timestamp (e.g: 20190117010349), which monotonically increases in the order of action's begin time.state
: current state of the instant
Hudi guarantees that the actions performed on the timeline are atomic & timeline consistent based on the instant time.
Atomicity is achieved by relying on the atomic puts to the underlying storage to move the write operations through various states in the timeline.
This is achieved on the underlying DFS (in the case of S3/Cloud Storage, by an atomic PUT operation) and can be observed by files of the pattern <instant>.<action>.<state>
in Hudi’s timeline.