Skip to main content
Version: 0.12.0

Procedures

Stored procedures available when use Hudi SparkSQL extensions in all spark's version.

Usage

CALL supports passing arguments by name (recommended) or by position. Mixing position and named arguments is also supported.

Named arguments

All procedure arguments are named. When passing arguments by name, arguments can be in any order and any optional argument can be omitted.

CALL system.procedure_name(arg_name_2 => arg_2, arg_name_1 => arg_1, ... arg_name_n => arg_n)

Positional arguments

When passing arguments by position, the arguments may be omitted if they are optional.

CALL system.procedure_name(arg_1, arg_2, ... arg_n)

note: The system here has no practical meaning, the complete procedure name is system.procedure_name.

Commit management

show_commits

Show commits' info.

Input

Parameter NameTypeRequiredDefault ValueDescription
tableStringYNoneHudi table name
limitIntN10Max number of records to be returned

Output

Output NameType
commit_timeString
total_bytes_writtenLong
total_files_addedLong
total_files_updatedLong
total_partitions_writtenLong
total_records_writtenLong
total_update_records_writtenLong
total_errorsLong

Example

call show_commits(table => 'test_hudi_table', limit => 10);
commit_timetotal_bytes_writtentotal_files_addedtotal_files_updatedtotal_partitions_writtentotal_records_writtentotal_update_records_writtentotal_errors
20220216171049652432653011000
20220216171027021435346101100
20220216171019361435349101100

show_commits_metadata

Show commits' metadata.

Input

Parameter NameTypeRequiredDefault ValueDescription
tableStringYNoneHudi table name
limitIntN10Max number of records to be returned

Output

Output NameType
commit_timeString
actionString
partitionString
file_idString
previous_commitString
num_writesLong
num_insertsLong
num_deletesLong
num_update_writesString
total_errorsLong
total_log_blocksLong
total_corrupt_logblocksLong
total_rollback_blocksLong
total_log_recordsLong
total_updated_records_compactedLong
total_bytes_writtenLong

Example

call show_commits_metadata(table => 'test_hudi_table');
commit_timeactionpartitionfile_idprevious_commitnum_writesnum_insertsnum_deletesnum_update_writestotal_errorstotal_log_blockstotal_corrupt_logblockstotal_rollback_blockstotal_log_recordstotal_updated_records_compactedtotal_bytes_written
20220109225319449commitdt=2021-05-03d0073a12-085d-4f49-83e9-402947e7e90a-0null1100000000435349
20220109225311742commitdt=2021-05-02b3b32bac-8a44-4c4d-b433-0cb1bf620f23-0202201092148305921100000000435340
20220109225301429commitdt=2021-05-010d7298b3-6b55-4cff-8d7d-b0772358b78a-0202201092148305921100000000435340
20220109214830592commitdt=2021-05-010d7298b3-6b55-4cff-8d7d-b0772358b78a-0202201091916310150010000000432653
20220109214830592commitdt=2021-05-02b3b32bac-8a44-4c4d-b433-0cb1bf620f23-0202201091916481810010000000432653
20220109191648181commitdt=2021-05-02b3b32bac-8a44-4c4d-b433-0cb1bf620f23-0null1100000000435341
20220109191631015commitdt=2021-05-010d7298b3-6b55-4cff-8d7d-b0772358b78a-0null1100000000435341

rollback_to_instant

Rollback a table to the commit that was current at some time.

Input

Parameter NameTypeRequiredDefault ValueDescription
tableStringYNoneHudi table name

Output

Output NameType
rollback_resultBoolean

Example

Roll back test_hudi_table to one instant

call rollback_to_instant(table => 'test_hudi_table', instant_time => '20220109225319449');
rollback_result
true

create_savepoints

Create a savepoint to hudi's table.

Input

Parameter NameTypeRequiredDefault ValueDescription
tableStringYNoneHudi table name
commit_TimeStringYNoneCommit time
userStringN""User name
commentsStringN""Comments

Output

Output NameType
create_savepoint_resultBoolean

Example

Roll back test_hudi_table to one instant

call create_savepoints(table => 'test_hudi_table', instant_time => '20220109225319449');
create_savepoint_result
true

delete_savepoints

Delete a savepoint to hudi's table.

Input

Parameter NameTypeRequiredDefault ValueDescription
tableStringYNoneHudi table name
instant_timeStringYNoneInstant time

Output

Output NameType
delete_savepoint_resultBoolean

Example

Delete a savepoint to test_hudi_table

call delete_savepoints(table => 'test_hudi_table', instant_time => '20220109225319449');
delete_savepoint_result
true

rollback_savepoints

Rollback a table to the commit that was current at some time.

Input

Parameter NameTypeRequiredDefault ValueDescription
tableStringYNoneHudi table name
instant_timeStringYNoneInstant time

Output

Output NameType
rollback_savepoint_resultBoolean

Example

Rollback test_hudi_table to one savepoint

call rollback_savepoints(table => 'test_hudi_table', instant_time => '20220109225319449');
rollback_savepoint_result
true

Optimization table

run_clustering

Trigger clustering on a hoodie table. By using partition predicates, clustering table can be run with specified partitions, and you can also specify the order columns to sort data.

note

Newly clustering instant will be generated every call, and all pending clustering instants are executed. When calling this procedure, one of parameters table and path must be specified at least. If both parameters are given, table will take effect.

Input

Parameter NameTypeRequiredDefault ValueDescription
tableStringNNoneName of table to be clustered
pathStringNNonePath of table to be clustered
predicateStringNNonePredicate to filter partition
orderStringNNoneOrder column split by ,

Output

Empty

Example

Clustering test_hudi_table with table name

call run_clustering(table => 'test_hudi_table')

Clustering test_hudi_table with table path

call run_clustering(path => '/tmp/hoodie/test_hudi_table')

Clustering test_hudi_table with table name, predicate and order column

call run_clustering(table => 'test_hudi_table', predicate => 'ts <= 20220408L', order => 'ts')

show_clustering

Show pending clusterings on a hoodie table.

note

When calling this procedure, one of parameters table and path must be specified at least. If both parameters are given, table will take effect.

Input

Parameter NameTypeRequiredDefault ValueDescription
tableStringNNoneName of table to be clustered
pathStringNNonePath of table to be clustered
limitIntNNoneMax number of records to be returned

Output

Parameter NameTypeRequiredDefault ValueDescription
timestampStringNNoneInstant time
groupsIntNNoneNumber of file groups to be processed

Example

Show pending clusterings with table name

call show_clustering(table => 'test_hudi_table')
timestampgroups
202204081537079282
202204081536369633

Show pending clusterings with table path

call show_clustering(path => '/tmp/hoodie/test_hudi_table')
timestampgroups
202204081537079282
202204081536369633

Show pending clusterings with table name and limit

call show_clustering(table => 'test_hudi_table', limit => 1)
timestampgroups
202204081537079282

run_compaction

Schedule or run compaction on a hoodie table.

note

For scheduling compaction, if timestamp is specified, new scheduled compaction will use given timestamp as instant time. Otherwise, compaction will be scheduled by using current system time.

For running compaction, given timestamp must be a pending compaction instant time that already exists, if it's not, exception will be thrown. Meanwhile, if timestampis specified and there are pending compactions, all pending compactions will be executed without new compaction instant generated.

When calling this procedure, one of parameters table and pathmust be specified at least. If both parameters are given, table will take effect.

Input

Parameter NameTypeRequiredDefault ValueDescription
opStringNNoneOperation type, RUN or SCHEDULE
tableStringNNoneName of table to be compacted
pathStringNNonePath of table to be compacted
timestampStringNNoneInstant time

Output

The output of RUN operation is EMPTY, the output of SCHEDULE as follow:

Parameter NameTypeRequiredDefault ValueDescription
instantStringNNoneInstant name

Example

Run compaction with table name

call run_compaction(op => 'run', table => 'test_hudi_table')

Run compaction with table path

call run_compaction(op => 'run', path => '/tmp/hoodie/test_hudi_table')

Run compaction with table path and timestamp

call run_compaction(op => 'run', path => '/tmp/hoodie/test_hudi_table', timestamp => '20220408153658568')

Schedule compaction with table name

call run_compaction(op => 'schedule', table => 'test_hudi_table')
instant
20220408153650834

Schedule compaction with table path

call run_compaction(op => 'schedule', path => '/tmp/hoodie/test_hudi_table')
instant
20220408153650834

Schedule compaction with table path and timestamp

call run_compaction(op => 'schedule', path => '/tmp/hoodie/test_hudi_table', timestamp => '20220408153658568')
instant
20220408153658568

show_compaction

Show all compactions on a hoodie table, in-flight or completed compactions are included, and result will be in reverse order according to trigger time.

note

When calling this procedure, one of parameters tableand path must be specified at least. If both parameters are given, table will take effect.

Input

Parameter NameTypeRequiredDefault ValueDescription
tableStringNNoneName of table to show compaction
pathStringNNonePath of table to show compaction
limitIntNNoneMax number of records to be returned

Output

Parameter NameTypeRequiredDefault ValueDescription
timestampStringNNoneInstant time
actionStringNNoneAction name of compaction
sizeIntNNoneNumber of file slices to be compacted

Example

Show compactions with table name

call show_compaction(table => 'test_hudi_table')
timestampactionsize
20220408153707928compaction10
20220408153636963compaction10

Show compactions with table path

call show_compaction(path => '/tmp/hoodie/test_hudi_table')
timestampactionsize
20220408153707928compaction10
20220408153636963compaction10

Show compactions with table name and limit

call show_compaction(table => 'test_hudi_table', limit => 1)
timestampactionsize
20220408153707928compaction10