Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix double output for (Left|Right|Outer) without condition #6010

Closed

Conversation

karteekmurthys
Copy link
Collaborator

@karteekmurthys karteekmurthys commented Aug 5, 2023

The NestedLoopJoin for Left/Right/Outer join without a condition is a cross join. Beyond that the code tries to find mismatched (build/probe) side rows to output will null values for outer join semantics.

If a filter is present, these mismatched rows SelectivityVector was set correctly depending on the filter results. However, if there is no filter present (the cross join case), the SelectivityVector was not updated in this code-path leading to all the rows being output a second time.

This PR fixes the problem by updating the SelectivityVector to indicate all rows are output already in the CrossJoin path.

Needed for prestodb/presto#20381 and fixes #5715

@netlify
Copy link

netlify bot commented Aug 5, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit b70b7bc
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/64d55f1bb191e8000860e4e0

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 5, 2023
@karteekmurthys karteekmurthys marked this pull request as draft August 5, 2023 00:38
@aditi-pandit aditi-pandit changed the title Fix matched rows for Cross Join without condition Fix double output for (Left|Right|Outer) without condition Aug 5, 2023
@@ -255,6 +255,64 @@ TEST_F(NestedLoopJoinTest, basicCrossJoin) {
"SELECT * FROM t, (SELECT * FROM UNNEST (ARRAY[10, 17, 10, 17, 10, 17, 10, 17])) u");
}

TEST_F(NestedLoopJoinTest, crossJoinWithoutFilterAndCriteria) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit : Rename test to outerJoinsWithoutCondition

velox/exec/tests/NestedLoopJoinTest.cpp Outdated Show resolved Hide resolved
createDuckDbTable("u", {buildVectors});

auto planNodeIdGenerator = std::make_shared<core::PlanNodeIdGenerator>();
auto executeOuterJoin = [&](core::JoinType joinType) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit : rename testOuterJoin

@aditi-pandit aditi-pandit marked this pull request as ready for review August 7, 2023 18:19
@@ -401,6 +401,11 @@ RowVectorPtr NestedLoopJoinProbe::doMatch(vector_size_t probeCnt) {
VELOX_CHECK(!hasProbedAllBuildData());

if (joinCondition_ == nullptr) {
// All rows in SelectivityVector for probe and build must be marked valid
// for a CrossJoin.
probeMatched_.setAll();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems inefficient to reset unused SelectivityVectors. Should we look into modifying the needsProbeMismatch and needsBuildMismatch instead?

Copy link
Contributor

@usurai usurai Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. An approach to support "left/right/full cross join" would be returning false from needsProbe/BuildMisatch when joinCondition_ == nullptr.

Copy link
Collaborator

@aditi-pandit aditi-pandit Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova, @usurai : A third option could be to change buildMismatchedOutput to return nullptr if joinCondition_ = nullptr
https://github.com/facebookincubator/velox/blob/main/velox/exec/NestedLoopJoinProbe.cpp#L225

It seemed needsProbe/BuildMismatch is well abstracted to just look at joinType to determine if mismatching is needed.

@mbasmanova
Copy link
Contributor

CC: @usurai

@usurai
Copy link
Contributor

usurai commented Aug 8, 2023

I'd like to share my perspective regarding cross joins and their compatibility with different join types. While I understand your viewpoint that cross joins could potentially support left and right join types, I tend to lean towards the idea that cross joins should remain distinct and not be extended to encompass inner or outer join functionalities. This separation can help maintain the clarity and purpose of each type of join, ensuring that their distinctive roles are preserved within SQL queries.
With that in mind, my design around nested loop join was:

  1. With join condition, the join is nested loop join that supports inner/outer join types.
  2. Without join condition, the join is cross join (or cross product) and has not join type support. This is why the build function of cross join has no join type parameter:

PlanBuilder& nestedLoopJoin(
const core::PlanNodePtr& right,
const std::vector<std::string>& outputLayout);

Anyway, if cross join with outer join types is needed, update needsProbeMismatch and needsBuildMismatch to return false when joinCondition_ == nullptr would be enough and effcient.

@aditi-pandit
Copy link
Collaborator

I'd like to share my perspective regarding cross joins and their compatibility with different join types. While I understand your viewpoint that cross joins could potentially support left and right join types, I tend to lean towards the idea that cross joins should remain distinct and not be extended to encompass inner or outer join functionalities. This separation can help maintain the clarity and purpose of each type of join, ensuring that their distinctive roles are preserved within SQL queries. With that in mind, my design around nested loop join was:

  1. With join condition, the join is nested loop join that supports inner/outer join types.
  2. Without join condition, the join is cross join (or cross product) and has not join type support. This is why the build function of cross join has no join type parameter:

PlanBuilder& nestedLoopJoin(
const core::PlanNodePtr& right,
const std::vector<std::string>& outputLayout);

Anyway, if cross join with outer join types is needed, update needsProbeMismatch and needsBuildMismatch to return false when joinCondition_ == nullptr would be enough and effcient.

Agree with @usurai that the current code is almost like 2 separate join cases put in the same operator. The bug was on account of the implementation of one side leaking to the other. It would be more clean to separate the CrossJoin from the outer joins with conditions.

@karteekmurthys karteekmurthys force-pushed the nestedloop-join-fix branch 2 times, most recently from 53ab6f4 to 6657052 Compare August 8, 2023 18:40
@@ -401,6 +403,11 @@ RowVectorPtr NestedLoopJoinProbe::doMatch(vector_size_t probeCnt) {
VELOX_CHECK(!hasProbedAllBuildData());

if (joinCondition_ == nullptr) {
// All rows in SelectivityVector for probe and build must be marked valid
// for a CrossJoin.
probeMatched_.setAll();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karteekmurthys : These setAll(...) calls shouldn't be needed if we make the other change to needsProbe/BuildMismatch or add a check in buildMismatchedOutput.

@@ -255,6 +255,44 @@ TEST_F(NestedLoopJoinTest, basicCrossJoin) {
"SELECT * FROM t, (SELECT * FROM UNNEST (ARRAY[10, 17, 10, 17, 10, 17, 10, 17])) u");
}

TEST_F(NestedLoopJoinTest, testOuterJoin) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename this testOuterJoinWithoutCondition so that the inner lambda can be renamed testOuterJoin ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: per naming convention, test methods should not have 'test' prefix

Copy link
Collaborator

@aditi-pandit aditi-pandit Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova : Yes, thanks my bad. I think this PR is done though. PTAL.

@aditi-pandit
Copy link
Collaborator

@usurai , @xiaoxmeng : Any chance you can take a look since Masha is on vacation ?

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng merged this pull request in 77e6d80.

@conbench-facebook
Copy link

Conbench analyzed the 1 benchmark run on commit 77e6d806.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

unigof pushed a commit to unigof/velox that referenced this pull request Aug 18, 2023
…ncubator#6010)

Summary:
The NestedLoopJoin for Left/Right/Outer join without a condition is a cross join. Beyond that the code tries to find mismatched (build/probe) side rows to output will null values for outer join semantics.

If a filter is present, these mismatched rows SelectivityVector was set correctly depending on the filter results. However, if there is no filter present (the cross join case), the SelectivityVector was not updated in this code-path leading to all the rows being output a second time.

This PR fixes the problem by updating the SelectivityVector to indicate all rows are output already in the CrossJoin path.

Needed for prestodb/presto#20381 and fixes facebookincubator#5715

Pull Request resolved: facebookincubator#6010

Reviewed By: Yuhta, amitkdutta

Differential Revision: D48383119

Pulled By: xiaoxmeng

fbshipit-source-id: 8a3725419d8fecd56827879ad6766cc479f468ac
ericyuliu pushed a commit to ericyuliu/velox that referenced this pull request Oct 12, 2023
…ncubator#6010)

Summary:
The NestedLoopJoin for Left/Right/Outer join without a condition is a cross join. Beyond that the code tries to find mismatched (build/probe) side rows to output will null values for outer join semantics.

If a filter is present, these mismatched rows SelectivityVector was set correctly depending on the filter results. However, if there is no filter present (the cross join case), the SelectivityVector was not updated in this code-path leading to all the rows being output a second time.

This PR fixes the problem by updating the SelectivityVector to indicate all rows are output already in the CrossJoin path.

Needed for prestodb/presto#20381 and fixes facebookincubator#5715

Pull Request resolved: facebookincubator#6010

Reviewed By: Yuhta, amitkdutta

Differential Revision: D48383119

Pulled By: xiaoxmeng

fbshipit-source-id: 8a3725419d8fecd56827879ad6766cc479f468ac
kagamiori added a commit to kagamiori/velox that referenced this pull request May 22, 2024
Summary:
Join with no condition is a cross product. The existing code avoid 
adding mismatch to the result after cross product because cross 
product should have matched all input rows (facebookincubator#6010). But there 
is an exception. When the build or probe side is empty, this cross 
product is empty too. Hence for left, right, and full join, mismatch 
should still be produced. This diff fixes this bug by still adding the 
mismatch to the result if either build or probe side is empty.

Differential Revision: D57681090
kagamiori added a commit to kagamiori/velox that referenced this pull request May 22, 2024
…acebookincubator#9892)

Summary:

Join with no condition is a cross product. The existing code avoid 
adding mismatch to the result after cross product because cross 
product should have matched all input rows (facebookincubator#6010). But there 
is an exception. When the build or probe side is empty, this cross 
product is empty too. Hence for left, right, and full join, mismatch 
should still be produced. This diff fixes this bug by still adding the 
mismatch to the result if either build or probe side is empty.

Differential Revision: D57681090
kagamiori added a commit to kagamiori/velox that referenced this pull request May 23, 2024
…acebookincubator#9892)

Summary:

Join with no condition is a cross product. The existing code avoid 
adding mismatch to the result after cross product because cross 
product should have matched all input rows (facebookincubator#6010). But there 
is an exception. When the build or probe side is empty, this cross 
product is empty too. Hence for left, right, and full join, mismatch 
should still be produced. This diff fixes this bug by still adding the 
mismatch to the result if either build or probe side is empty.

Reviewed By: Yuhta

Differential Revision: D57681090
kagamiori added a commit to kagamiori/velox that referenced this pull request May 23, 2024
…acebookincubator#9892)

Summary:

Join with no condition is a cross product. The existing code avoid 
adding mismatch to the result after cross product because cross 
product should have matched all input rows (facebookincubator#6010). But there 
is an exception. When the build or probe side is empty, this cross 
product is empty too. Hence for left, right, and full join, mismatch 
should still be produced. This diff fixes this bug by still adding the 
mismatch to the result if either build or probe side is empty.

Reviewed By: Yuhta

Differential Revision: D57681090
kagamiori added a commit to kagamiori/velox that referenced this pull request May 24, 2024
…acebookincubator#9892)

Summary:

Join with no condition is a cross product. The existing code avoid 
adding mismatch to the result after cross product because cross 
product should have matched all input rows (facebookincubator#6010). But there 
is an exception. When the build or probe side is empty, this cross 
product is empty too. Hence for left, right, and full join, mismatch 
should still be produced. This diff fixes this bug by still adding the 
mismatch to the result if either build or probe side is empty.

Reviewed By: Yuhta

Differential Revision: D57681090
kagamiori added a commit to kagamiori/velox that referenced this pull request May 24, 2024
…acebookincubator#9892)

Summary:

Join with no condition is a cross product. The existing code avoid 
adding mismatch to the result after cross product because cross 
product should have matched all input rows (facebookincubator#6010). But there 
is an exception. When the build or probe side is empty, this cross 
product is empty too. Hence for left, right, and full join, mismatch 
should still be produced. This diff fixes this bug by still adding the 
mismatch to the result if either build or probe side is empty.

Reviewed By: Yuhta

Differential Revision: D57681090
kagamiori added a commit to kagamiori/velox that referenced this pull request May 24, 2024
…acebookincubator#9892)

Summary:

Join with no condition is a cross product. The existing code avoid 
adding mismatch to the result after cross product because cross 
product should have matched all input rows (facebookincubator#6010). But there 
is an exception. When the build or probe side is empty, this cross 
product is empty too. Hence for left, right, and full join, mismatch 
should still be produced. This diff fixes this bug by still adding the 
mismatch to the result if either build or probe side is empty.

Reviewed By: Yuhta

Differential Revision: D57681090
kagamiori added a commit to kagamiori/velox that referenced this pull request May 24, 2024
…acebookincubator#9892)

Summary:

Join with no condition is a cross product. The existing code avoid 
adding mismatch to the result after cross product because cross 
product should have matched all input rows (facebookincubator#6010). But there 
is an exception. When the build or probe side is empty, this cross 
product is empty too. Hence for left, right, and full join, mismatch 
should still be produced. This diff fixes this bug by still adding the 
mismatch to the result if either build or probe side is empty.

Reviewed By: Yuhta

Differential Revision: D57681090
kagamiori added a commit to kagamiori/velox that referenced this pull request May 24, 2024
…acebookincubator#9892)

Summary:

Join with no condition is a cross product. The existing code avoid 
adding mismatch to the result after cross product because cross 
product should have matched all input rows (facebookincubator#6010). But there 
is an exception. When the build or probe side is empty, this cross 
product is empty too. Hence for left, right, and full join, mismatch 
should still be produced. This diff fixes this bug by still adding the 
mismatch to the result if either build or probe side is empty.

Reviewed By: Yuhta

Differential Revision: D57681090
kagamiori added a commit to kagamiori/velox that referenced this pull request May 24, 2024
…acebookincubator#9892)

Summary:

Join with no condition is a cross product. The existing code avoid 
adding mismatch to the result after cross product because cross 
product should have matched all input rows (facebookincubator#6010). But there 
is an exception. When the build or probe side is empty, this cross 
product is empty too. Hence for left, right, and full join, mismatch 
should still be produced. This diff fixes this bug by still adding the 
mismatch to the result if either build or probe side is empty.

Reviewed By: Yuhta

Differential Revision: D57681090
kagamiori added a commit to kagamiori/velox that referenced this pull request May 24, 2024
…acebookincubator#9892)

Summary:

Join with no condition is a cross product. The existing code avoid 
adding mismatch to the result after cross product because cross 
product should have matched all input rows (facebookincubator#6010). But there 
is an exception. When the build or probe side is empty, this cross 
product is empty too. Hence for left, right, and full join, mismatch 
should still be produced. This diff fixes this bug by still adding the 
mismatch to the result if either build or probe side is empty.

Reviewed By: Yuhta

Differential Revision: D57681090
kagamiori added a commit to kagamiori/velox that referenced this pull request May 24, 2024
…acebookincubator#9892)

Summary:

Join with no condition is a cross product. The existing code avoid 
adding mismatch to the result after cross product because cross 
product should have matched all input rows (facebookincubator#6010). But there 
is an exception. When the build or probe side is empty, this cross 
product is empty too. Hence for left, right, and full join, mismatch 
should still be produced. This diff fixes this bug by still adding the 
mismatch to the result if either build or probe side is empty.

Reviewed By: Yuhta

Differential Revision: D57681090
kagamiori added a commit to kagamiori/velox that referenced this pull request May 25, 2024
…acebookincubator#9892)

Summary:

Join with no condition is a cross product. The existing code avoid 
adding mismatch to the result after cross product because cross 
product should have matched all input rows (facebookincubator#6010). But there 
is an exception. When the build or probe side is empty, this cross 
product is empty too. Hence for left, right, and full join, mismatch 
should still be produced. This diff fixes this bug by still adding the 
mismatch to the result if either build or probe side is empty.

Reviewed By: Yuhta

Differential Revision: D57681090
kagamiori added a commit to kagamiori/velox that referenced this pull request May 25, 2024
…acebookincubator#9892)

Summary:

Join with no condition is a cross product. The existing code avoid 
adding mismatch to the result after cross product because cross 
product should have matched all input rows (facebookincubator#6010). But there 
is an exception. When the build or probe side is empty, this cross 
product is empty too. Hence for left, right, and full join, mismatch 
should still be produced. This diff fixes this bug by still adding the 
mismatch to the result if either build or probe side is empty.

Reviewed By: Yuhta

Differential Revision: D57681090
kagamiori added a commit to kagamiori/velox that referenced this pull request May 25, 2024
…acebookincubator#9892)

Summary:

Join with no condition is a cross product. The existing code avoid 
adding mismatch to the result after cross product because cross 
product should have matched all input rows (facebookincubator#6010). But there 
is an exception. When the build or probe side is empty, this cross 
product is empty too. Hence for left, right, and full join, mismatch 
should still be produced. This diff fixes this bug by still adding the 
mismatch to the result if either build or probe side is empty.

Reviewed By: Yuhta

Differential Revision: D57681090
kagamiori added a commit to kagamiori/velox that referenced this pull request May 28, 2024
…acebookincubator#9892)

Summary:

Join with no condition is a cross product. The existing code avoid 
adding mismatch to the result after cross product because cross 
product should have matched all input rows (facebookincubator#6010). But there 
is an exception. When the build or probe side is empty, this cross 
product is empty too. Hence for left, right, and full join, mismatch 
should still be produced. This diff fixes this bug by still adding the 
mismatch to the result if either build or probe side is empty.

Reviewed By: Yuhta

Differential Revision: D57681090
kagamiori added a commit to kagamiori/velox that referenced this pull request May 28, 2024
…acebookincubator#9892)

Summary:

Join with no condition is a cross product. The existing code avoid 
adding mismatch to the result after cross product because cross 
product should have matched all input rows (facebookincubator#6010). But there 
is an exception. When the build or probe side is empty, this cross 
product is empty too. Hence for left, right, and full join, mismatch 
should still be produced. This diff fixes this bug by still adding the 
mismatch to the result if either build or probe side is empty.

Reviewed By: Yuhta

Differential Revision: D57681090
facebook-github-bot pushed a commit that referenced this pull request May 28, 2024
…9892)

Summary:
Pull Request resolved: #9892

Join with no condition is a cross product. The existing code avoid
adding mismatch to the result after cross product because cross
product should have matched all input rows (#6010). But there
is an exception. When the build or probe side is empty, this cross
product is empty too. Hence for left, right, and full join, mismatch
should still be produced. This diff fixes this bug by still adding the
mismatch to the result if either build or probe side is empty.

Reviewed By: Yuhta

Differential Revision: D57681090

fbshipit-source-id: ac5960faf33166c4660bba25d516a2ffc1b6276c
Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Jun 7, 2024
…acebookincubator#9892)

Summary:
Pull Request resolved: facebookincubator#9892

Join with no condition is a cross product. The existing code avoid
adding mismatch to the result after cross product because cross
product should have matched all input rows (facebookincubator#6010). But there
is an exception. When the build or probe side is empty, this cross
product is empty too. Hence for left, right, and full join, mismatch
should still be produced. This diff fixes this bug by still adding the
mismatch to the result if either build or probe side is empty.

Reviewed By: Yuhta

Differential Revision: D57681090

fbshipit-source-id: ac5960faf33166c4660bba25d516a2ffc1b6276c
Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Jun 7, 2024
…acebookincubator#9892)

Summary:
Pull Request resolved: facebookincubator#9892

Join with no condition is a cross product. The existing code avoid
adding mismatch to the result after cross product because cross
product should have matched all input rows (facebookincubator#6010). But there
is an exception. When the build or probe side is empty, this cross
product is empty too. Hence for left, right, and full join, mismatch
should still be produced. This diff fixes this bug by still adding the
mismatch to the result if either build or probe side is empty.

Reviewed By: Yuhta

Differential Revision: D57681090

fbshipit-source-id: ac5960faf33166c4660bba25d516a2ffc1b6276c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Left/Right Join without join criteria must be implemented as NestedLoopJoin
6 participants