-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix double output for (Left|Right|Outer) without condition #6010
Fix double output for (Left|Right|Outer) without condition #6010
Conversation
✅ Deploy Preview for meta-velox canceled.
|
@@ -255,6 +255,64 @@ TEST_F(NestedLoopJoinTest, basicCrossJoin) { | |||
"SELECT * FROM t, (SELECT * FROM UNNEST (ARRAY[10, 17, 10, 17, 10, 17, 10, 17])) u"); | |||
} | |||
|
|||
TEST_F(NestedLoopJoinTest, crossJoinWithoutFilterAndCriteria) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit : Rename test to outerJoinsWithoutCondition
7cdcb11
to
4111f25
Compare
createDuckDbTable("u", {buildVectors}); | ||
|
||
auto planNodeIdGenerator = std::make_shared<core::PlanNodeIdGenerator>(); | ||
auto executeOuterJoin = [&](core::JoinType joinType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit : rename testOuterJoin
4111f25
to
7b3f478
Compare
velox/exec/NestedLoopJoinProbe.cpp
Outdated
@@ -401,6 +401,11 @@ RowVectorPtr NestedLoopJoinProbe::doMatch(vector_size_t probeCnt) { | |||
VELOX_CHECK(!hasProbedAllBuildData()); | |||
|
|||
if (joinCondition_ == nullptr) { | |||
// All rows in SelectivityVector for probe and build must be marked valid | |||
// for a CrossJoin. | |||
probeMatched_.setAll(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems inefficient to reset unused SelectivityVectors. Should we look into modifying the needsProbeMismatch and needsBuildMismatch instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. An approach to support "left/right/full cross join" would be returning false from needsProbe/BuildMisatch
when joinCondition_ == nullptr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova, @usurai : A third option could be to change buildMismatchedOutput to return nullptr if joinCondition_ = nullptr
https://github.com/facebookincubator/velox/blob/main/velox/exec/NestedLoopJoinProbe.cpp#L225
It seemed needsProbe/BuildMismatch is well abstracted to just look at joinType to determine if mismatching is needed.
CC: @usurai |
I'd like to share my perspective regarding cross joins and their compatibility with different join types. While I understand your viewpoint that cross joins could potentially support left and right join types, I tend to lean towards the idea that cross joins should remain distinct and not be extended to encompass inner or outer join functionalities. This separation can help maintain the clarity and purpose of each type of join, ensuring that their distinctive roles are preserved within SQL queries.
velox/velox/exec/tests/utils/PlanBuilder.h Lines 673 to 675 in 4b653ed
Anyway, if cross join with outer join types is needed, update |
7b3f478
to
1e926bd
Compare
Agree with @usurai that the current code is almost like 2 separate join cases put in the same operator. The bug was on account of the implementation of one side leaking to the other. It would be more clean to separate the CrossJoin from the outer joins with conditions. |
53ab6f4
to
6657052
Compare
velox/exec/NestedLoopJoinProbe.cpp
Outdated
@@ -401,6 +403,11 @@ RowVectorPtr NestedLoopJoinProbe::doMatch(vector_size_t probeCnt) { | |||
VELOX_CHECK(!hasProbedAllBuildData()); | |||
|
|||
if (joinCondition_ == nullptr) { | |||
// All rows in SelectivityVector for probe and build must be marked valid | |||
// for a CrossJoin. | |||
probeMatched_.setAll(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karteekmurthys : These setAll(...) calls shouldn't be needed if we make the other change to needsProbe/BuildMismatch or add a check in buildMismatchedOutput.
6657052
to
1034a63
Compare
1034a63
to
7e06525
Compare
@@ -255,6 +255,44 @@ TEST_F(NestedLoopJoinTest, basicCrossJoin) { | |||
"SELECT * FROM t, (SELECT * FROM UNNEST (ARRAY[10, 17, 10, 17, 10, 17, 10, 17])) u"); | |||
} | |||
|
|||
TEST_F(NestedLoopJoinTest, testOuterJoin) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename this testOuterJoinWithoutCondition so that the inner lambda can be renamed testOuterJoin ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: per naming convention, test methods should not have 'test' prefix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova : Yes, thanks my bad. I think this PR is done though. PTAL.
7e06525
to
b70b7bc
Compare
@usurai , @xiaoxmeng : Any chance you can take a look since Masha is on vacation ? |
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@xiaoxmeng merged this pull request in 77e6d80. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
…ncubator#6010) Summary: The NestedLoopJoin for Left/Right/Outer join without a condition is a cross join. Beyond that the code tries to find mismatched (build/probe) side rows to output will null values for outer join semantics. If a filter is present, these mismatched rows SelectivityVector was set correctly depending on the filter results. However, if there is no filter present (the cross join case), the SelectivityVector was not updated in this code-path leading to all the rows being output a second time. This PR fixes the problem by updating the SelectivityVector to indicate all rows are output already in the CrossJoin path. Needed for prestodb/presto#20381 and fixes facebookincubator#5715 Pull Request resolved: facebookincubator#6010 Reviewed By: Yuhta, amitkdutta Differential Revision: D48383119 Pulled By: xiaoxmeng fbshipit-source-id: 8a3725419d8fecd56827879ad6766cc479f468ac
…ncubator#6010) Summary: The NestedLoopJoin for Left/Right/Outer join without a condition is a cross join. Beyond that the code tries to find mismatched (build/probe) side rows to output will null values for outer join semantics. If a filter is present, these mismatched rows SelectivityVector was set correctly depending on the filter results. However, if there is no filter present (the cross join case), the SelectivityVector was not updated in this code-path leading to all the rows being output a second time. This PR fixes the problem by updating the SelectivityVector to indicate all rows are output already in the CrossJoin path. Needed for prestodb/presto#20381 and fixes facebookincubator#5715 Pull Request resolved: facebookincubator#6010 Reviewed By: Yuhta, amitkdutta Differential Revision: D48383119 Pulled By: xiaoxmeng fbshipit-source-id: 8a3725419d8fecd56827879ad6766cc479f468ac
Summary: Join with no condition is a cross product. The existing code avoid adding mismatch to the result after cross product because cross product should have matched all input rows (facebookincubator#6010). But there is an exception. When the build or probe side is empty, this cross product is empty too. Hence for left, right, and full join, mismatch should still be produced. This diff fixes this bug by still adding the mismatch to the result if either build or probe side is empty. Differential Revision: D57681090
…acebookincubator#9892) Summary: Join with no condition is a cross product. The existing code avoid adding mismatch to the result after cross product because cross product should have matched all input rows (facebookincubator#6010). But there is an exception. When the build or probe side is empty, this cross product is empty too. Hence for left, right, and full join, mismatch should still be produced. This diff fixes this bug by still adding the mismatch to the result if either build or probe side is empty. Differential Revision: D57681090
…acebookincubator#9892) Summary: Join with no condition is a cross product. The existing code avoid adding mismatch to the result after cross product because cross product should have matched all input rows (facebookincubator#6010). But there is an exception. When the build or probe side is empty, this cross product is empty too. Hence for left, right, and full join, mismatch should still be produced. This diff fixes this bug by still adding the mismatch to the result if either build or probe side is empty. Reviewed By: Yuhta Differential Revision: D57681090
…acebookincubator#9892) Summary: Join with no condition is a cross product. The existing code avoid adding mismatch to the result after cross product because cross product should have matched all input rows (facebookincubator#6010). But there is an exception. When the build or probe side is empty, this cross product is empty too. Hence for left, right, and full join, mismatch should still be produced. This diff fixes this bug by still adding the mismatch to the result if either build or probe side is empty. Reviewed By: Yuhta Differential Revision: D57681090
…acebookincubator#9892) Summary: Join with no condition is a cross product. The existing code avoid adding mismatch to the result after cross product because cross product should have matched all input rows (facebookincubator#6010). But there is an exception. When the build or probe side is empty, this cross product is empty too. Hence for left, right, and full join, mismatch should still be produced. This diff fixes this bug by still adding the mismatch to the result if either build or probe side is empty. Reviewed By: Yuhta Differential Revision: D57681090
…acebookincubator#9892) Summary: Join with no condition is a cross product. The existing code avoid adding mismatch to the result after cross product because cross product should have matched all input rows (facebookincubator#6010). But there is an exception. When the build or probe side is empty, this cross product is empty too. Hence for left, right, and full join, mismatch should still be produced. This diff fixes this bug by still adding the mismatch to the result if either build or probe side is empty. Reviewed By: Yuhta Differential Revision: D57681090
…acebookincubator#9892) Summary: Join with no condition is a cross product. The existing code avoid adding mismatch to the result after cross product because cross product should have matched all input rows (facebookincubator#6010). But there is an exception. When the build or probe side is empty, this cross product is empty too. Hence for left, right, and full join, mismatch should still be produced. This diff fixes this bug by still adding the mismatch to the result if either build or probe side is empty. Reviewed By: Yuhta Differential Revision: D57681090
…acebookincubator#9892) Summary: Join with no condition is a cross product. The existing code avoid adding mismatch to the result after cross product because cross product should have matched all input rows (facebookincubator#6010). But there is an exception. When the build or probe side is empty, this cross product is empty too. Hence for left, right, and full join, mismatch should still be produced. This diff fixes this bug by still adding the mismatch to the result if either build or probe side is empty. Reviewed By: Yuhta Differential Revision: D57681090
…acebookincubator#9892) Summary: Join with no condition is a cross product. The existing code avoid adding mismatch to the result after cross product because cross product should have matched all input rows (facebookincubator#6010). But there is an exception. When the build or probe side is empty, this cross product is empty too. Hence for left, right, and full join, mismatch should still be produced. This diff fixes this bug by still adding the mismatch to the result if either build or probe side is empty. Reviewed By: Yuhta Differential Revision: D57681090
…acebookincubator#9892) Summary: Join with no condition is a cross product. The existing code avoid adding mismatch to the result after cross product because cross product should have matched all input rows (facebookincubator#6010). But there is an exception. When the build or probe side is empty, this cross product is empty too. Hence for left, right, and full join, mismatch should still be produced. This diff fixes this bug by still adding the mismatch to the result if either build or probe side is empty. Reviewed By: Yuhta Differential Revision: D57681090
…acebookincubator#9892) Summary: Join with no condition is a cross product. The existing code avoid adding mismatch to the result after cross product because cross product should have matched all input rows (facebookincubator#6010). But there is an exception. When the build or probe side is empty, this cross product is empty too. Hence for left, right, and full join, mismatch should still be produced. This diff fixes this bug by still adding the mismatch to the result if either build or probe side is empty. Reviewed By: Yuhta Differential Revision: D57681090
…acebookincubator#9892) Summary: Join with no condition is a cross product. The existing code avoid adding mismatch to the result after cross product because cross product should have matched all input rows (facebookincubator#6010). But there is an exception. When the build or probe side is empty, this cross product is empty too. Hence for left, right, and full join, mismatch should still be produced. This diff fixes this bug by still adding the mismatch to the result if either build or probe side is empty. Reviewed By: Yuhta Differential Revision: D57681090
…acebookincubator#9892) Summary: Join with no condition is a cross product. The existing code avoid adding mismatch to the result after cross product because cross product should have matched all input rows (facebookincubator#6010). But there is an exception. When the build or probe side is empty, this cross product is empty too. Hence for left, right, and full join, mismatch should still be produced. This diff fixes this bug by still adding the mismatch to the result if either build or probe side is empty. Reviewed By: Yuhta Differential Revision: D57681090
…acebookincubator#9892) Summary: Join with no condition is a cross product. The existing code avoid adding mismatch to the result after cross product because cross product should have matched all input rows (facebookincubator#6010). But there is an exception. When the build or probe side is empty, this cross product is empty too. Hence for left, right, and full join, mismatch should still be produced. This diff fixes this bug by still adding the mismatch to the result if either build or probe side is empty. Reviewed By: Yuhta Differential Revision: D57681090
…acebookincubator#9892) Summary: Join with no condition is a cross product. The existing code avoid adding mismatch to the result after cross product because cross product should have matched all input rows (facebookincubator#6010). But there is an exception. When the build or probe side is empty, this cross product is empty too. Hence for left, right, and full join, mismatch should still be produced. This diff fixes this bug by still adding the mismatch to the result if either build or probe side is empty. Reviewed By: Yuhta Differential Revision: D57681090
…acebookincubator#9892) Summary: Join with no condition is a cross product. The existing code avoid adding mismatch to the result after cross product because cross product should have matched all input rows (facebookincubator#6010). But there is an exception. When the build or probe side is empty, this cross product is empty too. Hence for left, right, and full join, mismatch should still be produced. This diff fixes this bug by still adding the mismatch to the result if either build or probe side is empty. Reviewed By: Yuhta Differential Revision: D57681090
…acebookincubator#9892) Summary: Join with no condition is a cross product. The existing code avoid adding mismatch to the result after cross product because cross product should have matched all input rows (facebookincubator#6010). But there is an exception. When the build or probe side is empty, this cross product is empty too. Hence for left, right, and full join, mismatch should still be produced. This diff fixes this bug by still adding the mismatch to the result if either build or probe side is empty. Reviewed By: Yuhta Differential Revision: D57681090
…9892) Summary: Pull Request resolved: #9892 Join with no condition is a cross product. The existing code avoid adding mismatch to the result after cross product because cross product should have matched all input rows (#6010). But there is an exception. When the build or probe side is empty, this cross product is empty too. Hence for left, right, and full join, mismatch should still be produced. This diff fixes this bug by still adding the mismatch to the result if either build or probe side is empty. Reviewed By: Yuhta Differential Revision: D57681090 fbshipit-source-id: ac5960faf33166c4660bba25d516a2ffc1b6276c
…acebookincubator#9892) Summary: Pull Request resolved: facebookincubator#9892 Join with no condition is a cross product. The existing code avoid adding mismatch to the result after cross product because cross product should have matched all input rows (facebookincubator#6010). But there is an exception. When the build or probe side is empty, this cross product is empty too. Hence for left, right, and full join, mismatch should still be produced. This diff fixes this bug by still adding the mismatch to the result if either build or probe side is empty. Reviewed By: Yuhta Differential Revision: D57681090 fbshipit-source-id: ac5960faf33166c4660bba25d516a2ffc1b6276c
…acebookincubator#9892) Summary: Pull Request resolved: facebookincubator#9892 Join with no condition is a cross product. The existing code avoid adding mismatch to the result after cross product because cross product should have matched all input rows (facebookincubator#6010). But there is an exception. When the build or probe side is empty, this cross product is empty too. Hence for left, right, and full join, mismatch should still be produced. This diff fixes this bug by still adding the mismatch to the result if either build or probe side is empty. Reviewed By: Yuhta Differential Revision: D57681090 fbshipit-source-id: ac5960faf33166c4660bba25d516a2ffc1b6276c
The NestedLoopJoin for Left/Right/Outer join without a condition is a cross join. Beyond that the code tries to find mismatched (build/probe) side rows to output will null values for outer join semantics.
If a filter is present, these mismatched rows SelectivityVector was set correctly depending on the filter results. However, if there is no filter present (the cross join case), the SelectivityVector was not updated in this code-path leading to all the rows being output a second time.
This PR fixes the problem by updating the SelectivityVector to indicate all rows are output already in the CrossJoin path.
Needed for prestodb/presto#20381 and fixes #5715