Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test NestedLoopJoin and MergeJoin in join fuzzer #9901

Closed
wants to merge 2 commits into from

Conversation

kagamiori
Copy link
Contributor

@kagamiori kagamiori commented May 23, 2024

Summary:
A correctness bug was found in NestedLoopJoin recently (#9892),
so this diff adds NestedLoopJoin query plans with and without
TableScan to JoinFuzzer. It also adds MergeJoin with TableScan.

Differential Revision: D57703982

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 23, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D57703982

Copy link

netlify bot commented May 23, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 9e22983
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/665612148d403700084842ab

kagamiori added a commit to kagamiori/velox that referenced this pull request May 23, 2024
)

Summary:

This diff adds NestedLoopJoin query plans with and without
TableScan to JoinFuzzer. It also adds MergeJoin with TableScan.

Differential Revision: D57703982
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D57703982

@pedroerp
Copy link
Contributor

pedroerp commented May 23, 2024

How does this relate to #9898 which also adds alternative plans (including merge join)?

@kagamiori
Copy link
Contributor Author

How does this relate to #9898 which also adds alternative plans (including merge join)?

Hi @pedroerp, the merge join plan added in makeAlternativePlans() only reads data from Values node. This PR adds merge join plan that read from TableScan node.

kagamiori added a commit to kagamiori/velox that referenced this pull request May 24, 2024
)

Summary:

This diff adds NestedLoopJoin query plans with and without
TableScan to JoinFuzzer. It also adds MergeJoin with TableScan.

Differential Revision: D57703982
kagamiori added a commit to kagamiori/velox that referenced this pull request May 24, 2024
)

Summary:

This diff adds NestedLoopJoin query plans with and without
TableScan to JoinFuzzer. It also adds MergeJoin with TableScan.

Differential Revision: D57703982
kagamiori added a commit to kagamiori/velox that referenced this pull request May 24, 2024
)

Summary:

This diff adds NestedLoopJoin query plans with and without
TableScan to JoinFuzzer. It also adds MergeJoin with TableScan.

Differential Revision: D57703982
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D57703982

kagamiori added a commit to kagamiori/velox that referenced this pull request May 24, 2024
)

Summary:

A correctness bug was found in NestedLoopJoin recently (facebookincubator#9892), 
so this diff adds NestedLoopJoin query plans with and without
TableScan to JoinFuzzer. It also adds MergeJoin with TableScan.

Differential Revision: D57703982
kagamiori added a commit to kagamiori/velox that referenced this pull request May 24, 2024
)

Summary:

A correctness bug was found in NestedLoopJoin recently (facebookincubator#9892), 
so this diff adds NestedLoopJoin query plans with and without
TableScan to JoinFuzzer. It also adds MergeJoin with TableScan.

Differential Revision: D57703982
kagamiori added a commit to kagamiori/velox that referenced this pull request May 24, 2024
)

Summary:

A correctness bug was found in NestedLoopJoin recently (facebookincubator#9892), 
so this diff adds NestedLoopJoin query plans with and without
TableScan to JoinFuzzer. It also adds MergeJoin with TableScan.

Differential Revision: D57703982
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kagamiori thanks for the improvement % comments. Thanks!

@@ -131,6 +132,72 @@ class NestedLoopJoinTest : public HiveConnectorTestBase {
buildKeyName_)};
};

TEST_F(NestedLoopJoinTest, aaa) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aaa? :)

velox/exec/NestedLoopJoinProbe.h Show resolved Hide resolved
std::string makeJoinFilter(
const std::vector<std::string>& probeKeys,
const std::vector<std::string>& buildKeys) {
std::string filter{};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const auto numKeys = probeKeys.size();
std::string filter;

@@ -510,6 +510,21 @@ core::PlanNodePtr tryFlipJoinSides(const core::HashJoinNode& joinNode) {
joinNode.outputType());
}

core::PlanNodePtr tryFlipJoinSides(const core::NestedLoopJoinNode& joinNode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you name this tryFlipNestedJoinSides? And rename the existing one to tryFlipHashJoinSides? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @xiaoxmeng, I actually intentionally make the methods use the same name for both nested loop join and hash join. This is because I call tryFlipJoinSides() in addFlippedJoinPlan() that works for both join nodes.

@@ -813,6 +866,25 @@ void makeAlternativePlans(
joinNode->joinType())
.planNode()}});
}

// Use NestedLoopJoin.
if (joinNode->isInnerJoin() || joinNode->isLeftJoin() ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this into
addNestedJoinPlans? and the same for addMergeJoinPlans? thanks!

// Use NestedLoopJoin.
if (joinNode->isInnerJoin() || joinNode->isLeftJoin() ||
joinNode->isFullJoin()) {
auto filter = makeJoinFilter(probeKeys, buildKeys);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const auto filter

@@ -1114,6 +1186,67 @@ void JoinFuzzer::addPlansWithTableScan(
groupedProbeScanSplits,
groupedBuildScanSplits));
}

// Add ungrouped MergeJoin with TableScan.
auto planNodeIdGenerator = std::make_shared<core::PlanNodeIdGenerator>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge join and NestedLoopJoin doesn't work with grouped execution? Thanks!

Copy link
Contributor Author

@kagamiori kagamiori May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, grouped execution means the both input tables are already partitioned by the join key, so that one driver can join the partitions from both side on the same key value without knowing other partitions. Presto only supports hash join and hash aggregation according to the design doc (prestodb/presto#12124).

kagamiori added a commit to kagamiori/velox that referenced this pull request May 25, 2024
)

Summary:

A correctness bug was found in NestedLoopJoin recently (facebookincubator#9892), 
so this diff adds NestedLoopJoin query plans with and without
TableScan to JoinFuzzer. It also adds MergeJoin with TableScan.

Differential Revision: D57703982
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D57703982

@kagamiori kagamiori requested a review from xiaoxmeng May 25, 2024 00:41
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kagamiori LGTM % comments. Thanks!

asRowType(joinNode->outputType())->names(),
joinNode->joinType())
.planNode()}});
const auto filter = makeJoinFilter(probeKeys, buildKeys);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I'll remove it.

buildKeys,
probeScanSplits,
buildScanSplits,
outputColumns);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to flip the build/probe sides for merge join?

Copy link
Contributor Author

@kagamiori kagamiori May 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, merge join only supports inner and left join. I've tried flipping but that's not allowed.
==>
Thinking twice, actually we can flip join sides for inner merge join. Let me add that.

kagamiori added a commit to kagamiori/velox that referenced this pull request May 25, 2024
)

Summary:

A correctness bug was found in NestedLoopJoin recently (facebookincubator#9892), 
so this diff adds NestedLoopJoin query plans with and without
TableScan to JoinFuzzer. It also adds MergeJoin with TableScan.

Reviewed By: xiaoxmeng

Differential Revision: D57703982
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D57703982

kagamiori added a commit to kagamiori/velox that referenced this pull request May 25, 2024
)

Summary:

A correctness bug was found in NestedLoopJoin recently (facebookincubator#9892), 
so this diff adds NestedLoopJoin query plans with and without
TableScan to JoinFuzzer. It also adds MergeJoin with TableScan.

Reviewed By: xiaoxmeng

Differential Revision: D57703982
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D57703982

…acebookincubator#9892)

Summary:

Join with no condition is a cross product. The existing code avoid 
adding mismatch to the result after cross product because cross 
product should have matched all input rows (facebookincubator#6010). But there 
is an exception. When the build or probe side is empty, this cross 
product is empty too. Hence for left, right, and full join, mismatch 
should still be produced. This diff fixes this bug by still adding the 
mismatch to the result if either build or probe side is empty.

Reviewed By: Yuhta

Differential Revision: D57681090
)

Summary:

A correctness bug was found in NestedLoopJoin recently (facebookincubator#9892), 
so this diff adds NestedLoopJoin query plans with and without
TableScan to JoinFuzzer. It also adds MergeJoin with TableScan.

Reviewed By: xiaoxmeng

Differential Revision: D57703982
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D57703982

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in a06c53e.

Copy link

Conbench analyzed the 1 benchmark run on commit a06c53e6.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Jun 7, 2024
)

Summary:
Pull Request resolved: facebookincubator#9901

A correctness bug was found in NestedLoopJoin recently (facebookincubator#9892),
so this diff adds NestedLoopJoin query plans with and without
TableScan to JoinFuzzer. It also adds MergeJoin with TableScan.

Reviewed By: xiaoxmeng

Differential Revision: D57703982

fbshipit-source-id: 98d5d1cd6aa7a860e9b2bf1e1c10f9967f1c5dae
Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Jun 7, 2024
)

Summary:
Pull Request resolved: facebookincubator#9901

A correctness bug was found in NestedLoopJoin recently (facebookincubator#9892),
so this diff adds NestedLoopJoin query plans with and without
TableScan to JoinFuzzer. It also adds MergeJoin with TableScan.

Reviewed By: xiaoxmeng

Differential Revision: D57703982

fbshipit-source-id: 98d5d1cd6aa7a860e9b2bf1e1c10f9967f1c5dae
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants