-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test NestedLoopJoin and MergeJoin in join fuzzer #9901
Conversation
This pull request was exported from Phabricator. Differential Revision: D57703982 |
✅ Deploy Preview for meta-velox canceled.
|
2ff937c
to
0b7dd0d
Compare
This pull request was exported from Phabricator. Differential Revision: D57703982 |
How does this relate to #9898 which also adds alternative plans (including merge join)? |
0b7dd0d
to
b276cb0
Compare
This pull request was exported from Phabricator. Differential Revision: D57703982 |
) Summary: A correctness bug was found in NestedLoopJoin recently (facebookincubator#9892), so this diff adds NestedLoopJoin query plans with and without TableScan to JoinFuzzer. It also adds MergeJoin with TableScan. Differential Revision: D57703982
) Summary: A correctness bug was found in NestedLoopJoin recently (facebookincubator#9892), so this diff adds NestedLoopJoin query plans with and without TableScan to JoinFuzzer. It also adds MergeJoin with TableScan. Differential Revision: D57703982
) Summary: A correctness bug was found in NestedLoopJoin recently (facebookincubator#9892), so this diff adds NestedLoopJoin query plans with and without TableScan to JoinFuzzer. It also adds MergeJoin with TableScan. Differential Revision: D57703982
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kagamiori thanks for the improvement % comments. Thanks!
@@ -131,6 +132,72 @@ class NestedLoopJoinTest : public HiveConnectorTestBase { | |||
buildKeyName_)}; | |||
}; | |||
|
|||
TEST_F(NestedLoopJoinTest, aaa) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aaa? :)
velox/exec/fuzzer/JoinFuzzer.cpp
Outdated
std::string makeJoinFilter( | ||
const std::vector<std::string>& probeKeys, | ||
const std::vector<std::string>& buildKeys) { | ||
std::string filter{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const auto numKeys = probeKeys.size();
std::string filter;
@@ -510,6 +510,21 @@ core::PlanNodePtr tryFlipJoinSides(const core::HashJoinNode& joinNode) { | |||
joinNode.outputType()); | |||
} | |||
|
|||
core::PlanNodePtr tryFlipJoinSides(const core::NestedLoopJoinNode& joinNode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you name this tryFlipNestedJoinSides? And rename the existing one to tryFlipHashJoinSides? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @xiaoxmeng, I actually intentionally make the methods use the same name for both nested loop join and hash join. This is because I call tryFlipJoinSides() in addFlippedJoinPlan() that works for both join nodes.
@@ -813,6 +866,25 @@ void makeAlternativePlans( | |||
joinNode->joinType()) | |||
.planNode()}}); | |||
} | |||
|
|||
// Use NestedLoopJoin. | |||
if (joinNode->isInnerJoin() || joinNode->isLeftJoin() || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put this into
addNestedJoinPlans? and the same for addMergeJoinPlans? thanks!
velox/exec/fuzzer/JoinFuzzer.cpp
Outdated
// Use NestedLoopJoin. | ||
if (joinNode->isInnerJoin() || joinNode->isLeftJoin() || | ||
joinNode->isFullJoin()) { | ||
auto filter = makeJoinFilter(probeKeys, buildKeys); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const auto filter
velox/exec/fuzzer/JoinFuzzer.cpp
Outdated
@@ -1114,6 +1186,67 @@ void JoinFuzzer::addPlansWithTableScan( | |||
groupedProbeScanSplits, | |||
groupedBuildScanSplits)); | |||
} | |||
|
|||
// Add ungrouped MergeJoin with TableScan. | |||
auto planNodeIdGenerator = std::make_shared<core::PlanNodeIdGenerator>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge join and NestedLoopJoin doesn't work with grouped execution? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, grouped execution means the both input tables are already partitioned by the join key, so that one driver can join the partitions from both side on the same key value without knowing other partitions. Presto only supports hash join and hash aggregation according to the design doc (prestodb/presto#12124).
) Summary: A correctness bug was found in NestedLoopJoin recently (facebookincubator#9892), so this diff adds NestedLoopJoin query plans with and without TableScan to JoinFuzzer. It also adds MergeJoin with TableScan. Differential Revision: D57703982
b276cb0
to
465ae7d
Compare
This pull request was exported from Phabricator. Differential Revision: D57703982 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kagamiori LGTM % comments. Thanks!
velox/exec/fuzzer/JoinFuzzer.cpp
Outdated
asRowType(joinNode->outputType())->names(), | ||
joinNode->joinType()) | ||
.planNode()}}); | ||
const auto filter = makeJoinFilter(probeKeys, buildKeys); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I'll remove it.
buildKeys, | ||
probeScanSplits, | ||
buildScanSplits, | ||
outputColumns); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to flip the build/probe sides for merge join?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, merge join only supports inner and left join. I've tried flipping but that's not allowed.
==>
Thinking twice, actually we can flip join sides for inner merge join. Let me add that.
) Summary: A correctness bug was found in NestedLoopJoin recently (facebookincubator#9892), so this diff adds NestedLoopJoin query plans with and without TableScan to JoinFuzzer. It also adds MergeJoin with TableScan. Reviewed By: xiaoxmeng Differential Revision: D57703982
465ae7d
to
6f8eb79
Compare
This pull request was exported from Phabricator. Differential Revision: D57703982 |
) Summary: A correctness bug was found in NestedLoopJoin recently (facebookincubator#9892), so this diff adds NestedLoopJoin query plans with and without TableScan to JoinFuzzer. It also adds MergeJoin with TableScan. Reviewed By: xiaoxmeng Differential Revision: D57703982
6f8eb79
to
668f504
Compare
This pull request was exported from Phabricator. Differential Revision: D57703982 |
…acebookincubator#9892) Summary: Join with no condition is a cross product. The existing code avoid adding mismatch to the result after cross product because cross product should have matched all input rows (facebookincubator#6010). But there is an exception. When the build or probe side is empty, this cross product is empty too. Hence for left, right, and full join, mismatch should still be produced. This diff fixes this bug by still adding the mismatch to the result if either build or probe side is empty. Reviewed By: Yuhta Differential Revision: D57681090
) Summary: A correctness bug was found in NestedLoopJoin recently (facebookincubator#9892), so this diff adds NestedLoopJoin query plans with and without TableScan to JoinFuzzer. It also adds MergeJoin with TableScan. Reviewed By: xiaoxmeng Differential Revision: D57703982
668f504
to
9e22983
Compare
This pull request was exported from Phabricator. Differential Revision: D57703982 |
This pull request has been merged in a06c53e. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
) Summary: Pull Request resolved: facebookincubator#9901 A correctness bug was found in NestedLoopJoin recently (facebookincubator#9892), so this diff adds NestedLoopJoin query plans with and without TableScan to JoinFuzzer. It also adds MergeJoin with TableScan. Reviewed By: xiaoxmeng Differential Revision: D57703982 fbshipit-source-id: 98d5d1cd6aa7a860e9b2bf1e1c10f9967f1c5dae
) Summary: Pull Request resolved: facebookincubator#9901 A correctness bug was found in NestedLoopJoin recently (facebookincubator#9892), so this diff adds NestedLoopJoin query plans with and without TableScan to JoinFuzzer. It also adds MergeJoin with TableScan. Reviewed By: xiaoxmeng Differential Revision: D57703982 fbshipit-source-id: 98d5d1cd6aa7a860e9b2bf1e1c10f9967f1c5dae
Summary:
A correctness bug was found in NestedLoopJoin recently (#9892),
so this diff adds NestedLoopJoin query plans with and without
TableScan to JoinFuzzer. It also adds MergeJoin with TableScan.
Differential Revision: D57703982