Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connector changes for time travel BEFORE clause. #22851

Merged
merged 1 commit into from
Jun 5, 2024

Conversation

gupteaj
Copy link
Contributor

@gupteaj gupteaj commented May 28, 2024

Presto issue : #21971

Note - This PR has iceberg connector changes, test and doc update. Syntax change for "BEFORE" keyword in presto engine is already merged.

This feature will allow iceberg connector to query historical data using BEFORE syntax on a table.
Time travel version option will read bigint snapshot id value for the table. Time travel timestamp option will read
timestamp-with-time-zone value for the table.

Examples

// Return table data for tab1 before snapshot id 8772871542276440693
select * from tab1 FOR SYSTEM_VERSION BEFORE 8772871542276440693;

// Return table data for tab1 before '2023-08-17 13:29:46.822 America/Los_Angeles'
select * from tab1 FOR SYSTEM_TIME BEFORE TIMESTAMP '2023-08-17 13:29:46.822 America/Los_Angeles';

Iceberg connector

  • change getSnapshotIdAsOfTime() method to getSnapshotIdTimeOperator(), which will return snapshot id based on EQUAL or LESS_THAN operator
  • update getSnapshotIdForTableVersion() to handle BEFORE cases
  • add getVersionOperator() in connector table version

Motivation and Context

Snowflake provides a BEFORE clause for time travel queries. When used in a query, this clause is specified in the FROM clause immediately after the table name. It determines the point in the past from which historical data is requested for the object. The AT keyword includes changes made by a statement or transaction with a timestamp equal to the specified parameter, while the BEFORE keyword refers to a point immediately preceding the specified parameter.

IBM Netezza supports BEFORE clause similar to Snowflake.

Impact

A new table level option to return specific snapshot for iceberg connector

Test Plan

  • Simple cases with FOR VERSION BEFORE and FOR TIMESTAMP BEFORE options on a table
  • Alias SYSTEM_TIME and SYSTEM_VERSION cases
  • Multiple table join query with VERSION and TIMESTAMP syntax
  • Multiple table joins, CTE, Create table as cases
  • Union, Intersect, Except, Subquery and view cases
  • Error handling cases for FOR VERSION BEFORE and FOR TIMESTAMP BEFORE options

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.


== RELEASE NOTES ==

Iceberg Connector Changes
* Add time travel ``BEFORE`` syntax for Iceberg tables to return historical data :pr:`22851`
* Improve time travel ``VERSION (SYSTEM_VERSION)`` syntax to include snapshot id using bigint data type :pr:`22851`
* Improve time travel ``TIMESTAMP (SYSTEM_TIME)`` syntax to include timestamp-with-time-zone data type :pr:`22851`

steveburnett
steveburnett previously approved these changes May 28, 2024
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull branch, local docs build, everything looks good. Thanks!

@steveburnett
Copy link
Contributor

Suggest revising the release note entry following the Order of changes and Section naming topics in the Release Notes Guidelines.

== RELEASE NOTES ==

Iceberg Connector Changes
* Add time travel ``BEFORE`` syntax for Iceberg tables to return historical data. 
* Improve time travel ``VERSION (SYSTEM_VERSION)`` syntax to include snapshot id using bigint data type.
* Improve time travel ``TIMESTAMP (SYSTEM_TIME)`` syntax to include timestamp-with-time-zone data type. 

If the examples are valuable, suggest adding them to the Iceberg Connector documentation.

@tdcmeehan tdcmeehan mentioned this pull request May 30, 2024
6 tasks
@steveburnett
Copy link
Contributor

I overlooked "add PR # to the release note entry" in my previous comment, apologies for that!

== RELEASE NOTES ==

Iceberg Connector Changes
* Add time travel ``BEFORE`` syntax for Iceberg tables to return historical data :pr:`22851`
* Improve time travel ``VERSION (SYSTEM_VERSION)`` syntax to include snapshot id using bigint data type :pr:`22851`
* Improve time travel ``TIMESTAMP (SYSTEM_TIME)`` syntax to include timestamp-with-time-zone data type :pr:`22851`

@gupteaj gupteaj marked this pull request as ready for review May 31, 2024 15:35
@gupteaj gupteaj requested a review from presto-oss May 31, 2024 15:35
Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work, change looks good to me. Only one little thing in test.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, review new local docs build, everything looks good. Thanks!

Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @gupteaj

@tdcmeehan tdcmeehan merged commit 4245844 into prestodb:master Jun 5, 2024
57 checks passed
@wanglinsong wanglinsong mentioned this pull request Jun 25, 2024
36 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants