Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Pre Process Metadata Call when facing CTE Issue #22487

Merged

Conversation

konjac-h
Copy link
Contributor

@konjac-h konjac-h commented Apr 11, 2024

Description

When handling the query below with Common Table Expression (CTE)

WITH temp_cte AS (
    SELECT *
    FROM tpch.tiny.orders
)
SELECT *
FROM temp_cte

If a QuerySpecification node has the Optional from, Visitor will visit the corresponding table (visitTable). Subsequently, MetadataExtractor will include the table in its list of tableNames and then call 1getTableColumnMetadata`.

It will lead to unnecessary invocations of getTableColumnMetadata and getTableHandle when a query has a Common Table Expression (CTE).

To fix this issue, we will need to determine if the table_name inside query from <table_name> is from the CTE expression. If yes, we need to exclude it without sending request to getTableColumnMetadata

Impact

Fixing the pre_process_meatdata calls. It will help us reduce a lot of unnecessary call to getTableColumnMetadata and latency.

Test Plan

  • Testing locally
  • Verifier Test
  • I have ran the query that had the issue with prev version but got it fixed now with pre_process_metadata_calls enabled
Screenshot 2024-04-11 at 4 33 06 PM

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

@konjac-h konjac-h requested review from jaystarshot and a team as code owners April 11, 2024 07:20
@konjac-h konjac-h force-pushed the fix-pre-process-metadata-call branch from 3de6f51 to 59400bd Compare April 11, 2024 07:23
@jaystarshot
Copy link
Member

Will it work if you pass analysis to the metadataExtractor call and then use the analysis to determine this like done in RelationPlanner here

@jaystarshot
Copy link
Member

Also is there a way to add tests?

protected Void visitWith(With node, MetadataExtractorContext context)
{
node.getQueries().forEach(query -> {
context.addCTE(query.getName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calling it add addCTE, can we come up with some type agnostic name, such as skipProcessing(identifier)

@@ -186,7 +197,9 @@ protected Void visitTable(Table table, MetadataExtractorContext context)
}

// This could be either tableName, view, or MView
context.addTable(tableName);
if (!context.tableExistsInCommonTableExpressionNames(new Identifier(tableName.getObjectName()))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to below comment, this could be generalized by checking context.shouldSkipProcessing(identifier)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

@jainxrohit
Copy link
Contributor

is there a way to add tests?

+1, Lets add tests for with clause handling.

@konjac-h
Copy link
Contributor Author

Also is there a way to add tests?

Yea, I'm trying to figure out a valid test case which can fetch the corresponding Runtimestats info to help us verify it

@konjac-h konjac-h force-pushed the fix-pre-process-metadata-call branch from 59400bd to 8f80ac1 Compare April 15, 2024 23:30
@@ -125,17 +127,20 @@ private class MetadataExtractorContext
{
private final Optional<MetadataExtractorContext> parent;
private final Set<QualifiedObjectName> tableNames;
private final Set<Identifier> tableNamesToSkipProcessing; // Table names, such as query with CTE, that we don't need to get the metadata info so we should skip sending request
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the comment to top line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks

jainxrohit
jainxrohit previously approved these changes Apr 17, 2024
Copy link
Contributor

@NikhilCollooru NikhilCollooru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@NikhilCollooru NikhilCollooru merged commit 45387f9 into prestodb:master Apr 18, 2024
56 checks passed
@konjac-h
Copy link
Contributor Author

thanks nikhil for helping merge it!

@wanglinsong wanglinsong mentioned this pull request Jun 25, 2024
36 tasks
@yingsu00
Copy link
Contributor

@konjac-h Is there notable performance improvement to CTE selections? If yes, can you please provide a release note? Does the following sound good?

Improve the performance of reading common table expressions (CTE).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants