Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the hash join performance by replacing the RawTable to a simple Vec for JoinHashMap #6910

Open
yahoNanJing opened this issue Jul 11, 2023 · 5 comments
Labels
enhancement New feature or request performance Make DataFusion faster

Comments

@yahoNanJing
Copy link
Contributor

Is your feature request related to a problem or challenge?

When testing the TPCH q17 on my PC, based on #6800, it costs around 2.4s. Among them, it costs around 800ms for constructing the RawTable of JoinHashMap.

To only find the hash location, the structure, RawTable, we used is too complex and the overhead is too heavy.

Describe the solution you'd like

We can simple use a fixed size of vector with a hash mask to look up the hash location.

After applying the replacement, it only cost less than 100ms for the construction of JoinHashMap. And the overall time cost for the q17 is reduced from 2.4s to 1.8s.

Describe alternatives you've considered

No response

Additional context

No response

@yahoNanJing yahoNanJing added the enhancement New feature or request label Jul 11, 2023
@Dandandan
Copy link
Contributor

Dandandan commented Jul 11, 2023

That's interesting, wonder if you're not getting regressions on other queries (as probing might be slower?) 🤔 At least that's what I got in some earlier experiments.

Q17 is also interesting because it fails to plan/estimate the join correctly, it puts the largest side on the build side.

@Dandandan
Copy link
Contributor

I think it makes sense to first work on making equality faster, then testing this approach again #12131

@Dandandan Dandandan added the performance Make DataFusion faster label Aug 23, 2024
@Dandandan
Copy link
Contributor

I am testing this currently to see how this approach is running.

It seems #6724 and #6679 improved this already for quite a bit (maybe not enough)

@Dandandan
Copy link
Contributor

Dandandan commented Aug 23, 2024

Current results (closer to mixed results, but more regressions than wins)

--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ vectorized_hashtable_join ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  74.26ms │                   74.10ms │     no change │
│ QQuery 2     │  15.71ms │                   15.78ms │     no change │
│ QQuery 3     │  25.31ms │                   28.83ms │  1.14x slower │
│ QQuery 4     │  20.70ms │                   26.90ms │  1.30x slower │
│ QQuery 5     │  37.37ms │                   44.74ms │  1.20x slower │
│ QQuery 6     │   5.46ms │                    6.41ms │  1.17x slower │
│ QQuery 7     │  78.58ms │                   93.71ms │  1.19x slower │
│ QQuery 8     │  19.23ms │                   15.31ms │ +1.26x faster │
│ QQuery 9     │  46.49ms │                   50.21ms │  1.08x slower │
│ QQuery 10    │  42.36ms │                   43.06ms │     no change │
│ QQuery 11    │  30.29ms │                   32.25ms │  1.06x slower │
│ QQuery 12    │  23.48ms │                   22.19ms │ +1.06x faster │
│ QQuery 13    │  28.99ms │                   29.42ms │     no change │
│ QQuery 14    │   7.79ms │                    8.03ms │     no change │
│ QQuery 15    │  13.77ms │                   12.72ms │ +1.08x faster │
│ QQuery 16    │  17.16ms │                   16.70ms │     no change │
│ QQuery 17    │  61.71ms │                   54.08ms │ +1.14x faster │
│ QQuery 18    │ 138.43ms │                  186.12ms │  1.34x slower │
│ QQuery 19    │  22.48ms │                   22.02ms │     no change │
│ QQuery 20    │  29.27ms │                   28.51ms │     no change │
│ QQuery 21    │  93.84ms │                  107.18ms │  1.14x slower │
│ QQuery 22    │   9.03ms │                    9.97ms │  1.10x slower │
└──────────────┴──────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                        ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (main)                        │ 841.72ms │
│ Total Time (vectorized_hashtable_join)   │ 928.25ms │
│ Average Time (main)                      │  38.26ms │
│ Average Time (vectorized_hashtable_join) │  42.19ms │
│ Queries Faster                           │        4 │
│ Queries Slower                           │       10 │
│ Queries with No Change                   │        8 │
└──────────────────────────────────────────┴──────────┘
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ vectorized_hashtable_join ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 141.17ms │                  140.95ms │     no change │
│ QQuery 2     │  64.63ms │                   65.74ms │     no change │
│ QQuery 3     │  64.88ms │                   71.23ms │  1.10x slower │
│ QQuery 4     │  48.98ms │                   49.53ms │     no change │
│ QQuery 5     │  75.78ms │                   81.51ms │  1.08x slower │
│ QQuery 6     │  30.14ms │                   30.58ms │     no change │
│ QQuery 7     │  98.22ms │                  110.42ms │  1.12x slower │
│ QQuery 8     │  87.66ms │                   83.27ms │ +1.05x faster │
│ QQuery 9     │ 132.67ms │                  134.53ms │     no change │
│ QQuery 10    │ 120.88ms │                  120.26ms │     no change │
│ QQuery 11    │  43.79ms │                   49.48ms │  1.13x slower │
│ QQuery 12    │  82.68ms │                   80.17ms │     no change │
│ QQuery 13    │ 185.70ms │                  185.21ms │     no change │
│ QQuery 14    │  46.67ms │                   46.54ms │     no change │
│ QQuery 15    │  59.51ms │                   59.43ms │     no change │
│ QQuery 16    │  45.77ms │                   49.33ms │  1.08x slower │
│ QQuery 17    │ 122.20ms │                  111.54ms │ +1.10x faster │
│ QQuery 18    │ 160.71ms │                  171.14ms │  1.06x slower │
│ QQuery 19    │  88.44ms │                   87.32ms │     no change │
│ QQuery 20    │  71.72ms │                   72.27ms │     no change │
│ QQuery 21    │ 124.59ms │                  138.07ms │  1.11x slower │
│ QQuery 22    │  36.48ms │                   37.85ms │     no change │
└──────────────┴──────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main)                        │ 1933.26ms │
│ Total Time (vectorized_hashtable_join)   │ 1976.38ms │
│ Average Time (main)                      │   87.88ms │
│ Average Time (vectorized_hashtable_join) │   89.84ms │
│ Queries Faster                           │         2 │
│ Queries Slower                           │         7 │
│ Queries with No Change                   │        13 │
└──────────────────────────────────────────┴───────────┘

@Dandandan
Copy link
Contributor

I am going to experiment with a slight variation on this this weekend to reduce nr of collisions greatly while still using Vec-based indexing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Make DataFusion faster
Projects
None yet
2 participants