Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proc_macro: use crossbeam channels for the proc_macro cross-thread bridge #99123

Merged
merged 1 commit into from
Jul 30, 2022

Conversation

mystor
Copy link
Contributor

@mystor mystor commented Jul 10, 2022

This is done by having the crossbeam dependency inserted into the proc_macro server code from the server side, to avoid adding a dependency to proc_macro.

In addition, this introduces a -Z command-line option which will switch rustc to run proc-macros using this cross-thread executor. With the changes to the bridge in #98186, #98187, #98188 and #98189, the performance of the executor should be much closer to same-thread execution.

In local testing, the crossbeam executor was substantially more performant than either of the two existing CrossThread strategies, so they have been removed to keep things simple.

r? @eddyb

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jul 10, 2022
@rustbot

This comment was marked as off-topic.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 10, 2022
@mystor
Copy link
Contributor Author

mystor commented Jul 10, 2022

I don't think it's worth doing a perf run with the cross-thread executor enabled until at least #98189 has landed, and this patch shouldn't impact perf at all for the existing same-thread executor. We can do a perf comparison with a PR changing the default after these patches have both landed to see how much more work is left before a cross-thread executor is a viable option.

Copy link
Member

@eddyb eddyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and being able to test cross-thread proc macro isolation would be great, but I'm not sure if the new -Z flag requires an MCP (cc @rust-lang/compiler).

@eddyb
Copy link
Member

eddyb commented Jul 27, 2022

MCP has completed (see also the Zulip thread, which was mostly discussing the background of proc macro isolation):

Copy link
Member

@eddyb eddyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r=me after rebase

compiler/rustc_expand/src/proc_macro.rs Outdated Show resolved Hide resolved
@rustbot
Copy link
Collaborator

rustbot commented Jul 29, 2022

Some changes occurred in library/proc_macro/src/bridge

cc @rust-lang/wg-rls-2

@rust-log-analyzer

This comment has been minimized.

…idge

This is done by having the crossbeam dependency inserted into the
proc_macro server code from the server side, to avoid adding a
dependency to proc_macro.

In addition, this introduces a -Z command-line option which will switch
rustc to run proc-macros using this cross-thread executor. With the
changes to the bridge in rust-lang#98186, rust-lang#98187, rust-lang#98188 and rust-lang#98189, the
performance of the executor should be much closer to same-thread
execution.

In local testing, the crossbeam executor was substantially more
performant than either of the two existing CrossThread strategies, so
they have been removed to keep things simple.
@eddyb
Copy link
Member

eddyb commented Jul 29, 2022

@bors r+ rollup=never

@bors
Copy link
Contributor

bors commented Jul 29, 2022

📌 Commit 6d1650f has been approved by eddyb

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 29, 2022
@bors
Copy link
Contributor

bors commented Jul 30, 2022

⌛ Testing commit 6d1650f with merge bd84c73...

@bors
Copy link
Contributor

bors commented Jul 30, 2022

☀️ Test successful - checks-actions
Approved by: eddyb
Pushing bd84c73 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jul 30, 2022
@bors bors merged commit bd84c73 into rust-lang:master Jul 30, 2022
@rustbot rustbot added this to the 1.64.0 milestone Jul 30, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (bd84c73): comparison url.

Instruction count

  • Primary benchmarks: 😿 relevant regressions found
  • Secondary benchmarks: no relevant changes found
mean1 max count2
Regressions 😿
(primary)
0.9% 1.4% 11
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) 0.9% 1.4% 11

Max RSS (memory usage)

Results
  • Primary benchmarks: 😿 relevant regression found
  • Secondary benchmarks: 🎉 relevant improvements found
mean1 max count2
Regressions 😿
(primary)
2.1% 2.1% 1
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
-3.2% -3.4% 3
All 😿🎉 (primary) 2.1% 2.1% 1

Cycles

Results
  • Primary benchmarks: mixed results
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
3.5% 7.0% 4
Regressions 😿
(secondary)
1.6% 1.6% 1
Improvements 🎉
(primary)
-4.4% -4.4% 1
Improvements 🎉
(secondary)
-3.1% -3.1% 1
All 😿🎉 (primary) 1.9% 7.0% 5

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot added the perf-regression Performance regression. label Jul 30, 2022
@nnethercote
Copy link
Contributor

this patch shouldn't impact perf at all for the existing same-thread executor.

html5ever-0.26.0 regressed uniformly. Any ideas why that might have happened?

Having said that, clicking on "show non-relevant results" shows some possible improvements. E.g. keccak had a bunch of ~2% improvements that for some reason didn't hit the significance threshold.

@eddyb
Copy link
Member

eddyb commented Jul 30, 2022

cc @wesleywiser Just noticed something here - so html5ever-0.26.0-debug incr-unchanged has the worse % increase in expand_crate, and I'm wondering if it would make sense to track more fine-grained aspects of expansion?
It would be useful to know, for example, how many proc macro invocations happened, how much they account for the expand_crate time, etc.


Regarding the apparent regression: if this is LLVM doing something weird, we could mark the cross-thread executor method as #[cold] #[inline(never)] perhaps?

@pnkfelix
Copy link
Member

I don't know if its helpful @eddyb , but I've recreated the regression locally, and I think the bulk of the increase in instruction count is coming from <rustc_expand::mbe::macro_parser::TtParser>::parse_tt for some reason:

Instruction Counts 8f68c43c bd84c73f difference
TOTAL 988,908,196 (100.0%) 1,002,719,736 (100.0%) +13,811,540
parse_tt 134,903,906 (13.64%) 149,152,914 (14.87%) +14,249,008

In any case, I'm going to take a shot now at marking the cross-thread executor method as #[cold] #[inline(never)] to see if that makes this issue go away here.

@pnkfelix
Copy link
Member

Unfortunately I took my shot (see below) at marking the cross-thread executor as #[cold] #[inline(never)] and it didn't seem to help.

diff
diff --git a/library/proc_macro/src/bridge/server.rs b/library/proc_macro/src/bridge/server.rs
index e068ec60b6a..a29fae17292 100644
--- a/library/proc_macro/src/bridge/server.rs
+++ b/library/proc_macro/src/bridge/server.rs
@@ -213,6 +213,8 @@ impl<P> ExecutionStrategy for CrossThread<P>
 where
     P: MessagePipe<Buffer> + Send + 'static,
 {
+    #[cold]
+    #[inline(never)]
     fn run_bridge_and_client(
         &self,
         dispatcher: &mut impl DispatcherTrait,

Comment on lines +1474 to +1476
proc_macro_execution_strategy: ProcMacroExecutionStrategy = (ProcMacroExecutionStrategy::SameThread,
parse_proc_macro_execution_strategy, [UNTRACKED],
"how to run proc-macro code (default: same-thread)"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pnkfelix I'm running out of ideas... what if you move this to the end? (as in, maybe the order of the fields or their offsets, in this huge struct, is causing weird things)

It could even be its presence - the flag could be removed and exec_strategy (the sole user of the flag, in compiler/rustc_expand/src/proc_macro.rs) made to, idk, use std::env::var instead? (or just hardcode the choice, but that might cause changes in optimizations)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wacky! I moved the field to the end of the struct, and that made the two methods with the two highest instruction counts (parse_tt and MacroRulesMacroExpander::expand) each have lower instruction-counts (Ir) according to cachegrind, but the overall instruction-count increased:

Instruction Counts 8f68c43c bd84c73f field-at-end
TOTAL 988,908,196 (100.0%) 1,002,719,736 (100.0%) 1,138,908,984 (100.0%)
parse_tt 134,903,906 (13.64%) 149,152,914 (14.87%) 99,794,157 (8.76%)
MacroRules
MacroExpander
::expand
56,387,220 (5.70%) 56,379,944 (5.62%) 50,969,953 (4.48%)
SipHasher128::finish128 5,975,227 (0.60%) 5,975,227 (0.60%) 36,508,882 (3.21%)
token_name_eq (not in report) (not in report) 29,133,905 (2.56%)
try_mark_previous_green 53,862,272 (5.45%) 53,872,906 (5.37%) 25,344,946 (2.23%)

I don't really want to spend too much more trying to dissect this thing that is probably just an LLVM codegen oddity, but I will at least try eliminating the field entirely as you suggest.

(As an aside, the fact that this kind of field shuffling had such an enormous impact on parse_tt does make me wonder if there's some kind of low-hanging fruit embedded in the code there, e.g. maybe the code is accessing the session when it should be locally stashing an unchanging value? But isn't that the kind of thing you'd like to believe your compiler is doing for you automatically? 😆 )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that the code is hot, and also that register spilling is happening in a different way or inlining choices are a little different. I.e. the kind of thing that's hard to control, and even if you manage to get it in the faster state now a tiny change later on might make it go back to the slower state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants