Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bloom filter Join Step I: create benchmark #11933

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Lordworms
Copy link
Contributor

Which issue does this PR close?

part of #7955

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Lordworms -- I realize i am very behind on reviews in DataFusion

My first question on these benchmarks is are they measuring the right thing (namely are they dominated by the join time). Have you had a chance to run any profiling (flamegraphs, etc) to confirm these benchmarks are actually join dominated?

datafusion/core/benches/bloom_filter_join.rs Outdated Show resolved Hide resolved
@alamb alamb marked this pull request as draft August 20, 2024 17:47
@alamb
Copy link
Contributor

alamb commented Aug 20, 2024

Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look

@Lordworms
Copy link
Contributor Author

for TPCH query 17, when we create 1000000 rows for lineitem and part table, the time spent on join is 50% (the other 80% of time spent on creating parquet files)
Screenshot 2024-08-25 at 1 37 15 PM

@Lordworms
Copy link
Contributor Author

For the second case, 95% of the time spent on join
image

@Lordworms
Copy link
Contributor Author

I think it worth a try to implement join predicate pushdown

@Lordworms Lordworms marked this pull request as ready for review August 25, 2024 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants