Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip hbo stats recording for nodes with dynamic filter #22853

Merged
merged 1 commit into from
Jun 5, 2024

Conversation

feilong-liu
Copy link
Contributor

@feilong-liu feilong-liu commented May 28, 2024

Description

Presto CPP enable dynamic filter pushdown from join build to join probe side. When this is enabled, the number of output rows reported in the probe side will be less than the number of rows without dynamic filter pushdown.
This can be a problem if we still record the number of output rows in HBO. For example, for join T1 Join T2, after dynamic filter pushdown, it's possible that the probe side outputs 0 rows, and HBO will record 0 rows for probe side. Next time, we ran this query, HBO will use T1 as build side, as it's smaller. However now build side does not have dynamic filter pushdown, the build side may OOM due to no dynamic filter pushdown on build side.

In this PR, I will skip the HBO stats recording if it's affected by dynamic filter pushdown. It includes all plan nodes between the node where filter was pushed to (scan node in above example) and before the join node.

Motivation and Context

Described above

Impact

Avoid potential OOM due to inaccurate statistics caused by dynamic filter

Test Plan

End to end test
Control vs. Test
and unit test

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Fix HBO to skip tracking of stats for plan nodes affected by dynamic filter pushdown in presto cpp :pr:`22853 `

@feilong-liu feilong-liu requested a review from a team as a code owner May 28, 2024 21:31
@feilong-liu feilong-liu marked this pull request as draft May 28, 2024 21:31
@feilong-liu feilong-liu force-pushed the hbo_dynamic branch 4 times, most recently from 9aa48b4 to 49a1232 Compare June 3, 2024 20:54
@feilong-liu feilong-liu marked this pull request as ready for review June 4, 2024 05:01
@feilong-liu feilong-liu requested review from a team and jaystarshot as code owners June 4, 2024 05:01
Copy link
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about dynamic filter for broadcast joins in Presto Java? 4332408

@feilong-liu
Copy link
Contributor Author

What about dynamic filter for broadcast joins in Presto Java? 4332408

Didn't know we have such support in Presto Java. Will take a look later. Ideally if we can have Presto Java to produce the same information as in #22734, it will be handled by the change here.

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feilong-liu thanks for the change!

Optional<DynamicFilterStats> optionalDynamicFilterStats = Optional.empty();
if (stats1.isPresent()) {
DynamicFilterStats dynamicFilterStats = stats1.get();
stats2.ifPresent(dynamicFilterStats::mergeWith);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does stats2.ifPresent(dynamicFilterStats::mergeWith) do here? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to merge the stats of the same plan node reported by different tasks

}
}
}
Set<PlanNodeId> planNodeIdsDynamicFilter = new HashSet<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/planNodeIdsDynamicFilter/planNodeIdsWithDynamicFilterApplied/

@xiaoxmeng xiaoxmeng merged commit 32c9f93 into prestodb:master Jun 5, 2024
59 checks passed
@wanglinsong wanglinsong mentioned this pull request Jun 25, 2024
36 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants