-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip hbo stats recording for nodes with dynamic filter #22853
Conversation
9aa48b4
to
49a1232
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about dynamic filter for broadcast joins in Presto Java? 4332408
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@feilong-liu thanks for the change!
presto-tests/src/test/java/com/facebook/presto/execution/TestHistoryBasedStatsTracking.java
Show resolved
Hide resolved
Optional<DynamicFilterStats> optionalDynamicFilterStats = Optional.empty(); | ||
if (stats1.isPresent()) { | ||
DynamicFilterStats dynamicFilterStats = stats1.get(); | ||
stats2.ifPresent(dynamicFilterStats::mergeWith); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does stats2.ifPresent(dynamicFilterStats::mergeWith) do here? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's to merge the stats of the same plan node reported by different tasks
} | ||
} | ||
} | ||
Set<PlanNodeId> planNodeIdsDynamicFilter = new HashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/planNodeIdsDynamicFilter/planNodeIdsWithDynamicFilterApplied/
Description
Presto CPP enable dynamic filter pushdown from join build to join probe side. When this is enabled, the number of output rows reported in the probe side will be less than the number of rows without dynamic filter pushdown.
This can be a problem if we still record the number of output rows in HBO. For example, for join T1 Join T2, after dynamic filter pushdown, it's possible that the probe side outputs 0 rows, and HBO will record 0 rows for probe side. Next time, we ran this query, HBO will use T1 as build side, as it's smaller. However now build side does not have dynamic filter pushdown, the build side may OOM due to no dynamic filter pushdown on build side.
In this PR, I will skip the HBO stats recording if it's affected by dynamic filter pushdown. It includes all plan nodes between the node where filter was pushed to (scan node in above example) and before the join node.
Motivation and Context
Described above
Impact
Avoid potential OOM due to inaccurate statistics caused by dynamic filter
Test Plan
End to end test
Control vs. Test
and unit test
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.