Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache output equivalence_properties in ProjectionExec #9097

Closed
wants to merge 6 commits into from

Conversation

gruuya
Copy link
Contributor

@gruuya gruuya commented Feb 1, 2024

Which issue does this PR close?

Closes #9084.

Rationale for this change

Avoid potential exponential explosions in branching calls to output_partitioning and equivalence_properties in certain plan combinations.

What changes are included in this PR?

Simply store the equivalence_properties, which are always calculated anyway in ProjectionExec.

Are these changes tested?

They were tested on the example from #9084

Are there any user-facing changes?

Not really, apart from not stalling TPC-DS q64.

@@ -56,6 +56,8 @@ pub struct ProjectionExec {
input: Arc<dyn ExecutionPlan>,
/// The output ordering
output_ordering: Option<Vec<PhysicalSortExpr>>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory this doesn't need to be stored now, since it can be obtained via self.equivalence_properties.oeq_class().output_ordering(), however the last method has some non-trivial computation which are probably best avoided anytime there's a call to ProjectionExec::output_ordering()

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Feb 1, 2024
@github-actions github-actions bot removed the sqllogictest SQL Logic Tests (.slt) label Feb 1, 2024
@simonvandel
Copy link
Contributor

Out of curiousity I ran the sql_planner benchmark on this PR compared to c843226 (main at the time of writing).

Even though the PR is not targetted at those queries, the improvements seem to be in the range of no-change to -3% 👍

Results
     Running benches/sql_planner.rs (/home/svs/code/arrow-datafusion/target/release-nonlto/deps/sql_planner-89232c907fea5c25)
Gnuplot not found, using plotters backend
Benchmarking logical_select_one_from_700: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.7s, enable flat sampling, or reduce sample count to 60.
logical_select_one_from_700
                        time:   [1.3148 ms 1.3211 ms 1.3283 ms]
                        change: [-1.2182% -0.5419% +0.1004%] (p = 0.11 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

physical_select_one_from_700
                        time:   [4.5290 ms 4.5467 ms 4.5678 ms]
                        change: [-0.7308% -0.2853% +0.2090%] (p = 0.27 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

Benchmarking logical_trivial_join_low_numbered_columns: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.8s, enable flat sampling, or reduce sample count to 60.
logical_trivial_join_low_numbered_columns
                        time:   [1.3374 ms 1.3413 ms 1.3454 ms]
                        change: [-1.0865% -0.5004% +0.0116%] (p = 0.06 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

Benchmarking logical_trivial_join_high_numbered_columns: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.9s, enable flat sampling, or reduce sample count to 50.
logical_trivial_join_high_numbered_columns
                        time:   [1.3660 ms 1.3684 ms 1.3712 ms]
                        change: [-1.5172% -1.0961% -0.6794%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

Benchmarking logical_aggregate_with_join: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.1s, enable flat sampling, or reduce sample count to 50.
logical_aggregate_with_join
                        time:   [1.7815 ms 1.7837 ms 1.7859 ms]
                        change: [-2.1258% -1.5940% -1.1366%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

Benchmarking physical_plan_tpch_q1: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.9s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q1   time:   [1.3576 ms 1.3596 ms 1.3617 ms]
                        change: [-1.0661% -0.4282% +0.3850%] (p = 0.26 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  6 (6.00%) high severe

Benchmarking physical_plan_tpch_q2: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.1s, enable flat sampling, or reduce sample count to 50.
physical_plan_tpch_q2   time:   [1.3940 ms 1.3958 ms 1.3975 ms]
                        change: [-1.1803% -0.8095% -0.4489%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low severe
  2 (2.00%) high mild
  2 (2.00%) high severe

Benchmarking physical_plan_tpch_q3: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q3   time:   [1.2505 ms 1.2523 ms 1.2541 ms]
                        change: [-1.5359% -1.0619% -0.6131%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

Benchmarking physical_plan_tpch_q4: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.4s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q4   time:   [1.2593 ms 1.2613 ms 1.2632 ms]
                        change: [-1.2966% -0.9442% -0.6166%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  8 (8.00%) high mild

Benchmarking physical_plan_tpch_q5: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.6s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q5   time:   [1.2976 ms 1.2992 ms 1.3009 ms]
                        change: [-1.2683% -0.8299% -0.3415%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe

Benchmarking physical_plan_tpch_q6: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.7s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q6   time:   [1.1260 ms 1.1282 ms 1.1304 ms]
                        change: [-0.8453% -0.4227% -0.0237%] (p = 0.06 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

Benchmarking physical_plan_tpch_q7: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.6s, enable flat sampling, or reduce sample count to 50.
physical_plan_tpch_q7   time:   [1.4897 ms 1.4930 ms 1.4976 ms]
                        change: [-1.1966% -0.5377% +0.1408%] (p = 0.13 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low severe
  5 (5.00%) high mild
  6 (6.00%) high severe

Benchmarking physical_plan_tpch_q8: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.7s, enable flat sampling, or reduce sample count to 50.
physical_plan_tpch_q8   time:   [1.5160 ms 1.5185 ms 1.5217 ms]
                        change: [-0.7303% -0.3073% +0.2294%] (p = 0.22 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe

Benchmarking physical_plan_tpch_q9: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.1s, enable flat sampling, or reduce sample count to 50.
physical_plan_tpch_q9   time:   [1.4044 ms 1.4062 ms 1.4081 ms]
                        change: [-2.2499% -1.5619% -0.9930%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low mild
  6 (6.00%) high mild
  2 (2.00%) high severe

Benchmarking physical_plan_tpch_q10: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.8s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q10  time:   [1.3326 ms 1.3346 ms 1.3368 ms]
                        change: [-1.5355% -1.1162% -0.6136%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 16 outliers among 100 measurements (16.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  7 (7.00%) high mild
  6 (6.00%) high severe

Benchmarking physical_plan_tpch_q11: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.5s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q11  time:   [1.2844 ms 1.2861 ms 1.2880 ms]
                        change: [-1.0891% -0.7848% -0.4596%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

Benchmarking physical_plan_tpch_q12: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.6s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q12  time:   [1.2914 ms 1.2932 ms 1.2953 ms]
                        change: [-1.3540% -0.8637% -0.3279%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  5 (5.00%) high severe

Benchmarking physical_plan_tpch_q13: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.0s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q13  time:   [1.1891 ms 1.1914 ms 1.1937 ms]
                        change: [-1.2510% -0.8514% -0.3725%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

Benchmarking physical_plan_tpch_q14: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.2s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q14  time:   [1.2280 ms 1.2300 ms 1.2322 ms]
                        change: [-0.8543% -0.3613% +0.2262%] (p = 0.18 > 0.05)
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  4 (4.00%) high severe

Benchmarking physical_plan_tpch_q16: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q16  time:   [1.2386 ms 1.2406 ms 1.2427 ms]
                        change: [-1.6075% -1.0106% -0.4775%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

Benchmarking physical_plan_tpch_q17: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.1s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q17  time:   [1.2031 ms 1.2054 ms 1.2078 ms]
                        change: [-1.6412% -1.2875% -0.9239%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

Benchmarking physical_plan_tpch_q18: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.5s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q18  time:   [1.2832 ms 1.2851 ms 1.2871 ms]
                        change: [-1.2041% -0.8586% -0.5098%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

Benchmarking physical_plan_tpch_q19: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.9s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q19  time:   [1.3537 ms 1.3567 ms 1.3602 ms]
                        change: [-4.1553% -2.8145% -1.6060%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe

Benchmarking physical_plan_tpch_q20: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.6s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q20  time:   [1.2995 ms 1.3014 ms 1.3034 ms]
                        change: [-0.6861% -0.3842% -0.0702%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild

Benchmarking physical_plan_tpch_q21: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.7s, enable flat sampling, or reduce sample count to 50.
physical_plan_tpch_q21  time:   [1.5220 ms 1.5242 ms 1.5265 ms]
                        change: [-4.0173% -3.0381% -2.1649%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

Benchmarking physical_plan_tpch_q22: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.9s, enable flat sampling, or reduce sample count to 50.
physical_plan_tpch_q22  time:   [1.3675 ms 1.3692 ms 1.3711 ms]
                        change: [-1.9314% -1.3749% -0.8150%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe

physical_plan_tpch_all  time:   [27.848 ms 27.893 ms 27.941 ms]
                        change: [-0.7545% -0.5233% -0.2729%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

@alamb alamb changed the title Keep track of output equivalence_properties in ProjectionExec Cache output equivalence_properties in ProjectionExec Feb 1, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a very nice change to me @gruuya -- nice find

I wonder if we should add a similar optimization for other nodes where compuing this is non trivial (like HashJoinExec or HashAggregateExec or FilterExec)

I could imagine even changing the signature of ExecutionPlan::equivalence_properties to return a reference and basically force pre-computing the results 🤔

I agree with @metegenez on #9084 (comment) that it would be good if @mustafasrepo weighs in on this PR prior to merge

datafusion/physical-plan/src/projection.rs Outdated Show resolved Hide resolved
@gruuya
Copy link
Contributor Author

gruuya commented Feb 2, 2024

Thanks!

I wonder if we should add a similar optimization for other nodes where computing this is non trivial (like HashJoinExec or HashAggregateExec or FilterExec)

Yup, this makes sense to me. In fact AggregateExec also always computes the eq_props when created and then throws it away (thus forcing re-computing it later on).

I could imagine even changing the signature of ExecutionPlan::equivalence_properties to return a reference and basically force pre-computing the results 🤔

This also sounds good. Arguably for other nodes this might be forcing an eager computation of something that might not be used in theory, but in practice I think these two methods always get invoked during the planning process, at least to propagate the info up the plan tree to something that does cache it now (might also need some profiling/investigating here too).

I can try and make these additional changes separately if other people think it could be a good idea. I also think breaking up DefaultPhysicalPlanner::create_initial_plans LogicalPlan::Join case would be beneficial, since I think that's what's causing the stack overflow for the tpcds_physical_q64 test (which would then also cover the present issue).

@ozankabak
Copy link
Contributor

@alamb let's wait on merging this. We want to think a little about this and have ideas for a different general solution. We should have something next week.

@alamb
Copy link
Contributor

alamb commented Feb 2, 2024

@alamb let's wait on merging this. We want to think a little about this and have ideas for a different general solution. We should have something next week.

This is fine with me if it is ok with @gruuya. Also given how small this PR is, if he needs the change sooner maybe we can consider merging it so it is available while the more general solution is underway. I don't feel strongly about this.

@ozankabak / @berkaysynnada is there a ticket somewhere that describes what you are doing ? It might help to make the plans more visible to others as well as I would like to point out we have a version of proejction pushdown in InfluxDB https://github.com/influxdata/influxdb/blob/main/iox_query/src/physical_optimizer/projection_pushdown.rs that might be worth a look as well

@berkaysynnada
Copy link
Contributor

@ozankabak / @berkaysynnada is there a ticket somewhere that describes what you are doing ?

We will open the PR to the upstream next week. I will also request API-level suggestions there. It's not expected to significantly alter our current plans, but it will be a solid step towards optimizing potential outcomes following other existing optimizations and future ones. Thanks for the IOx solution, I will examine it.

@ozankabak
Copy link
Contributor

@gruuya, let us know if it is very urgent for you. Then we can go ahead and merge for now and change again in a few days. Otherwise, we can collaborate next week on the more general approach we are working on.

@ozankabak / @berkaysynnada is there a ticket somewhere that describes what you are doing ?

We will open a ticket and a draft PR next week as an RFC for this

@alamb
Copy link
Contributor

alamb commented Feb 2, 2024

We will open the PR to the upstream next week

Just to be clear, I was suggesting a ticket Before the PR. Some reasons a ticket prior to PR might be be beneficial are:

  1. It may avoid potentially unecessary work like this PR as others would know what is coming
  2. Could provide an area to receive feeback before you code a particular approach up
  3. Would give people a high level idea of what is coming, and thus making reviews faster

@ozankabak
Copy link
Contributor

BTW, for the sake clarity, there are two steps of the work here:

  1. Finding a general solution to expensive calls such as equivalence_properties, whose PR will come next week when @mustafasrepo comes back.
  2. Improving the projection pushdown code (whose issue @berkaysynnada will open shortly and will create a PR for next week)

These two are loosely related but independent tasks

@gruuya
Copy link
Contributor Author

gruuya commented Feb 2, 2024

@alamb, @ozankabak this is not that urgent for us (though it would certainly be nice to have a fix soon).

My impression from #9084 (comment) was that the general solution considered would actually be complementary to the one in this PR (i.e. locally caching stuff in the optimizer as opposed to caching it at the node level), hence possibly not necessitating changing it later on.

Otherwise, we can wait for the more general solution, thanks.

@berkaysynnada
Copy link
Contributor

Just to be clear, I was suggesting a ticket Before the PR. Some reasons a ticket prior to PR might be be beneficial are:

I thought it would be easier to discuss through code, but you are right in the importance of filling a ticket. I just mean that this PR has nothing to do with the projection optimizer I'm currently working on.

@alamb let's wait on merging this. We want to think a little about this and have ideas for a different general solution. We should have something next week.

"a different general solution" is not the projection optimizer 😁

I have opened the issue as you suggest: #9111.

@alamb alamb marked this pull request as draft February 2, 2024 21:47
@alamb
Copy link
Contributor

alamb commented Feb 2, 2024

Converting to draft as we wait for additional potential improvements so it is clear this PR is not waiting on feedback

@ozankabak
Copy link
Contributor

Update: In a couple days, we will open a PR with a solution that takes the core idea of this and generalizes it in a neat way (we think). Stay tuned!

@gruuya
Copy link
Contributor Author

gruuya commented Feb 29, 2024

Superseded by #9346

@gruuya gruuya closed this Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exponential nature of ExecutionPlans output_partitioning and equivalence_properties
5 participants