Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dumb NRVO #72205

Merged
merged 11 commits into from
May 21, 2020
Merged

Dumb NRVO #72205

merged 11 commits into from
May 21, 2020

Conversation

ecstatic-morse
Copy link
Contributor

@ecstatic-morse ecstatic-morse commented May 14, 2020

This is a very simple version of an NRVO pass, which scans backwards from the return terminator to see if there is an an assignment like _0 = _1. If a basic block with two or more predecessors is encountered during this scan without first seeing an assignment to the return place, we bail out. This avoids running a full "reaching definitions" dataflow analysis.

I wanted to see how much rustc would benefit from even a very limited version of this optimization. We should be able to use this as a point of comparison for more advanced versions that are based on live ranges.

r? @ghost

@jonas-schievink
Copy link
Contributor

FWIW here is the perf result of a rustc optimized with #71003, it shows basically no improvements (unless I configured the pass wrong again)

@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented May 14, 2020

Ah, that's surprising. I thought the perf run in #71003 (comment) indicated that there was a positive effect for some crates.

My hypothesis is that having one less local in the MIR makes codegen a bit faster, but the quality of the generated code (at least for rustc) is not improved. In fact, rustc seems a bit slower when it was built with NRVO, even if it is not running the NRVO pass. This is concerning to me. I'll have to look at some assembly to see what's going on, but maybe it would be worth checking whether the size of the returned type is above a certain threshold before trying NRVO?

@jonas-schievink
Copy link
Contributor

Yeah, we also thought that the improvements happen because the NRVO pass simplifies MIR enough that it speeds up codegen and metadata/incremental en/decoding of it, and it might also be faster when doing const evaluation on it.

@ecstatic-morse
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented May 14, 2020

⌛ Trying commit 20678a2347705402cc4c64b29f5d76b44077e36b with merge 23312ba3b09177c369a0e8ce37563b26993dbee2...

@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-8 of your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
##[section]Starting: Linux x86_64-gnu-llvm-8
##[section]Starting: Initialize job
Agent name: 'Azure Pipelines 9'
Agent machine name: 'fv-az578'
Current agent version: '2.168.2'
##[group]Operating System
16.04.6
LTS
LTS
##[endgroup]
##[group]Virtual Environment
Environment: ubuntu-16.04
Version: 20200430.2
Included Software: https://github.com/actions/virtual-environments/blob/ubuntu16/20200430.2/images/linux/Ubuntu1604-README.md
##[endgroup]
Agent running as: 'vsts'
Prepare build directory.
Set build variables.
Download all required tasks.
Download all required tasks.
Downloading task: Bash (3.163.2)
Checking job knob settings.
   Knob: AgentToolsDirectory = /opt/hostedtoolcache Source: ${AGENT_TOOLSDIRECTORY} 
   Knob: AgentPerflog = /home/vsts/perflog Source: ${VSTS_AGENT_PERFLOG} 
Start tracking orphan processes.
##[section]Finishing: Initialize job
##[section]Starting: Configure Job Name
==============================================================================
---
========================== Starting Command Output ===========================
[command]/bin/bash --noprofile --norc /home/vsts/work/_temp/2cd99d5a-3d37-45a5-b988-68cbf1dbaf28.sh

##[section]Finishing: Disable git automatic line ending conversion
##[section]Starting: Checkout rust-lang/rust@refs/pull/72205/merge to s
Task         : Get sources
Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.
Version      : 1.0.0
Author       : Microsoft
---
##[command]git remote add origin https://github.com/rust-lang/rust
##[command]git config gc.auto 0
##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
##[command]git config --get-all http.proxy
##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/72205/merge:refs/remotes/pull/72205/merge
---
 ---> cb2676f08729
Step 5/8 : ENV RUST_CONFIGURE_ARGS       --build=x86_64-unknown-linux-gnu       --llvm-root=/usr/lib/llvm-8       --enable-llvm-link-shared       --set rust.thin-lto-import-instr-limit=10
 ---> Using cache
 ---> df25ce111862
Step 6/8 : ENV SCRIPT python2.7 ../x.py test --exclude src/tools/tidy &&            python2.7 ../x.py test src/test/mir-opt --pass=build                                   --target=armv5te-unknown-linux-gnueabi &&            python2.7 ../x.py test src/tools/tidy
 ---> 599b9ac96b27
Step 7/8 : ENV NO_DEBUG_ASSERTIONS=1
 ---> Using cache
 ---> 091087e35a36
---
   Compiling fmt_macros v0.0.0 (/checkout/src/libfmt_macros)
   Compiling rustc_ast_pretty v0.0.0 (/checkout/src/librustc_ast_pretty)
   Compiling chalk-rust-ir v0.10.0
   Compiling rustc_hir v0.0.0 (/checkout/src/librustc_hir)
   Compiling rustc_query_system v0.0.0 (/checkout/src/librustc_query_system)
   Compiling chalk-solve v0.10.0
   Compiling rustc_hir_pretty v0.0.0 (/checkout/src/librustc_hir_pretty)
   Compiling rustc_parse v0.0.0 (/checkout/src/librustc_parse)
   Compiling rustc_ast_lowering v0.0.0 (/checkout/src/librustc_ast_lowering)
---
   Compiling fmt_macros v0.0.0 (/checkout/src/libfmt_macros)
   Compiling chalk-rust-ir v0.10.0
   Compiling rustc_ast_pretty v0.0.0 (/checkout/src/librustc_ast_pretty)
   Compiling rustc_hir v0.0.0 (/checkout/src/librustc_hir)
   Compiling rustc_query_system v0.0.0 (/checkout/src/librustc_query_system)
   Compiling chalk-solve v0.10.0
   Compiling rustc_hir_pretty v0.0.0 (/checkout/src/librustc_hir_pretty)
   Compiling rustc_parse v0.0.0 (/checkout/src/librustc_parse)
   Compiling rustc_ast_lowering v0.0.0 (/checkout/src/librustc_ast_lowering)
---
......................................................i............................................. 1800/10164
.................................................................................................... 1900/10164
........................................................................i..i........................ 2000/10164
.................................................................................................... 2100/10164
..............................................................iiiii................................. 2200/10164
.................................................................................................... 2400/10164
.................................................................................................... 2500/10164
.................................................................................................... 2600/10164
.................................................................................................... 2700/10164
---
.................................................................................................... 5200/10164
.................................................................................................... 5300/10164
.........................i.......................................................................... 5400/10164
..................i................................................................................. 5500/10164
.........................ii.ii........i...i......................................................... 5600/10164
..........................................................................i......................... 5800/10164
.................................................................................................... 5900/10164
.....................ii.....................................i....................................... 6000/10164
.................................................................................................... 6100/10164
.................................................................................................... 6100/10164
.................................................................................................... 6200/10164
..................................................................................ii...i..ii........ 6300/10164
.................................................................................................... 6500/10164
.................................................................................................... 6600/10164
.................................................................................................... 6700/10164
.................................................................................................... 6700/10164
...............i..ii................................................................................ 6800/10164
.................................................................................................... 7000/10164
.....................................................................i.............................. 7100/10164
.................................................................................................... 7200/10164
.................................................................................................... 7300/10164
---
.................................................................................................... 8100/10164
.................................................................................................... 8200/10164
......................................................................................i............. 8300/10164
.................................................................................................... 8400/10164
........................................iiiiii.iiiii.i.............................................. 8500/10164
.................................................................................................... 8700/10164
.................................................................................................... 8800/10164
.................................................................................................... 8900/10164
.................................................................................................... 9000/10164
---
............................................................F....................................... 100/102
..
failures:

---- [mir-opt] mir-opt/inline/issue-58867-inline-as-ref-as-mut.rs stdout ----
8     let mut _4: &mut [T];                // in scope 0 at $DIR/issue-58867-inline-as-ref-as-mut.rs:3:5: 3:6
9     scope 1 {
10         debug self => _4;                // in scope 1 at $SRC_DIR/libcore/convert/mod.rs:LL:COL
-         let mut _5: &mut [T];            // in scope 1 at $DIR/issue-58867-inline-as-ref-as-mut.rs:3:5: 3:15
13 
14     bb0: {

16         StorageLive(_3);                 // scope 0 at $DIR/issue-58867-inline-as-ref-as-mut.rs:3:5: 3:15
16         StorageLive(_3);                 // scope 0 at $DIR/issue-58867-inline-as-ref-as-mut.rs:3:5: 3:15
17         StorageLive(_4);                 // scope 0 at $DIR/issue-58867-inline-as-ref-as-mut.rs:3:5: 3:6
18         _4 = &mut (*_1);                 // scope 0 at $DIR/issue-58867-inline-as-ref-as-mut.rs:3:5: 3:6
-         StorageLive(_5);                 // scope 1 at $SRC_DIR/libcore/convert/mod.rs:LL:COL
-         _5 = _4;                         // scope 1 at $SRC_DIR/libcore/convert/mod.rs:LL:COL
-         _3 = _5;                         // scope 1 at $SRC_DIR/libcore/convert/mod.rs:LL:COL
-         StorageDead(_5);                 // scope 1 at $SRC_DIR/libcore/convert/mod.rs:LL:COL
+         _3 = _4;                         // scope 1 at $SRC_DIR/libcore/convert/mod.rs:LL:COL
23         _2 = &mut (*_3);                 // scope 0 at $DIR/issue-58867-inline-as-ref-as-mut.rs:3:5: 3:15
24         StorageDead(_4);                 // scope 0 at $DIR/issue-58867-inline-as-ref-as-mut.rs:3:14: 3:15
25         _0 = &mut (*_2);                 // scope 0 at $DIR/issue-58867-inline-as-ref-as-mut.rs:3:5: 3:15

thread '[mir-opt] mir-opt/inline/issue-58867-inline-as-ref-as-mut.rs' panicked at 'Actual MIR output differs from expected MIR output /checkout/src/test/mir-opt/inline/issue-58867-inline-as-ref-as-mut/rustc.a.Inline.after.mir', src/tools/compiletest/src/runtest.rs:3166:25


failures:
    [mir-opt] mir-opt/inline/issue-58867-inline-as-ref-as-mut.rs
    [mir-opt] mir-opt/inline/issue-58867-inline-as-ref-as-mut.rs

test result: FAILED. 101 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out

thread 'main' panicked at 'Some tests failed', src/tools/compiletest/src/main.rs:348:22


command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/compiletest" "--compile-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" "--run-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "--rustc-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--src-base" "/checkout/src/test/mir-opt" "--build-base" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/mir-opt" "--stage-id" "stage2-x86_64-unknown-linux-gnu" "--mode" "mir-opt" "--target" "x86_64-unknown-linux-gnu" "--host" "x86_64-unknown-linux-gnu" "--llvm-filecheck" "/usr/lib/llvm-8/bin/FileCheck" "--nodejs" "/usr/bin/node" "--host-rustcflags" "-Crpath -O -Cdebuginfo=0 -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--target-rustcflags" "-Crpath -O -Cdebuginfo=0 -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--docck-python" "/usr/bin/python2.7" "--lldb-python" "/usr/bin/python2.7" "--gdb" "/usr/bin/gdb" "--quiet" "--llvm-version" "8.0.0" "--system-llvm" "--cc" "" "--cxx" "" "--cflags" "" "--llvm-components" "" "--adb-path" "adb" "--adb-test-dir" "/data/tmp/work" "--android-cross-path" "" "--color" "always"


failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test --exclude src/tools/tidy
Build completed unsuccessfully in 1:01:05
Build completed unsuccessfully in 1:01:05
== clock drift check ==
  local time: Thu May 14 19:27:28 UTC 2020
  network time: Thu, 14 May 2020 19:27:28 GMT
== end clock drift check ==

##[error]Bash exited with code '1'.
##[section]Finishing: Run build
##[section]Starting: Checkout rust-lang/rust@refs/pull/72205/merge to s
Task         : Get sources
Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.
Version      : 1.0.0
Author       : Microsoft
Author       : Microsoft
Help         : [More Information](https://go.microsoft.com/fwlink/?LinkId=798199)
==============================================================================
Cleaning any cached credential from repository: rust-lang/rust (GitHub)
##[section]Finishing: Checkout rust-lang/rust@refs/pull/72205/merge to s
Cleaning up task key
Start cleaning up orphan processes.
Terminate orphan process: pid (3831) (python)
##[section]Finishing: Finalize Job

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @rust-lang/infra. (Feature Requests)

@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented May 14, 2020

So I've verified that we generate much better assembly for the following function:

fn foo(init: fn(&mut [u8; 1024])) -> [u8; 1024] {
    let mut buf = [0; 1024];
    init(&mut buf);
    buf
}

Obviously, simpler versions that call a concrete function or assign to buf directly in the body of foo do not see any improvement. I assume that most functions in rustc don't match this narrow case; They are either too complex for NRVO to trigger or are simple enough that some combination of inlining and LLVM's copy propagation can handle them.

#62446 is another example of a function that would benefit from even this dumb version of NRVO.

@bors
Copy link
Contributor

bors commented May 14, 2020

☀️ Try build successful - checks-azure
Build commit: 23312ba3b09177c369a0e8ce37563b26993dbee2 (23312ba3b09177c369a0e8ce37563b26993dbee2)

@rust-timer
Copy link
Collaborator

Queued 23312ba3b09177c369a0e8ce37563b26993dbee2 with parent af6d886, future comparison URL.

@MSxDOS
Copy link

MSxDOS commented May 14, 2020

Does #57077 or Box::new() (#49733 (comment)) have any improvements with this?

@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented May 14, 2020

The call to ret_val in #57077 gets optimized to a thin wrapper around SomeExternFun with this PR. I'll label that issue appropriately.

If you have a specific example for Box, I can run it locally since I have everything built at the moment.

@MSxDOS
Copy link

MSxDOS commented May 14, 2020

If you have a specific example for Box, I can run it locally since I have everything built at the moment.

#57077 (comment) playground link for example, but AFAIR it doesn't require any specific steps to trigger the unnecessary memcpy.

@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented May 14, 2020

No change in generated ASM for either of the no_mangle functions.

This makes sense to me because the returned data is on the heap. It's not getting stored in the return place by this PR. Only the pointer is.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit 23312ba3b09177c369a0e8ce37563b26993dbee2, comparison URL.

@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented May 15, 2020

So check builds are a bit faster, which I believe means that rustc itself has sped up, since we're not running the any new code during check builds of the benchmark suite. debug and opt measurements will include the (minimal) overhead of this NRVO pass and any changes to codegen that result from it.

@rust-lang/wg-mir-opt Do we want to merge this sort of simplistic version of NRVO while we wait for a more robust one? I would need to update debuginfo and also validate the implicit assumption this PR makes about MIR building: that _0 is not used earlier in the function such that it cannot be renamed.

I think this might actually be worthwhile, despite the title of this PR, since it does resolve some open issues and appears to make rustc faster. However, it won't optimize some useful cases like chained assignments that would require copy propagation or variable coalescing.

Also, I want to note that this wasn't possible until relatively recently. #71005 was the last in a series of PRs that help to make this work. Thanks @jonas-schievink!

@jonas-schievink
Copy link
Contributor

Whoa, nice!

src/librustc_mir/transform/nrvo.rs Outdated Show resolved Hide resolved
src/librustc_mir/transform/nrvo.rs Outdated Show resolved Hide resolved
/// Looks at all basic blocks that are terminated with a `Return` to see what local is copied to
/// the return place in that basic block. If all `Return`-terminated blocks assign the same local
/// to the return place, return that local.
fn single_local_assigned_to_return_place(body: &mut mir::Body<'_>) -> Option<Local> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are any uses of _0 without it being a projectionless place on the lhs of an assignment, we should also return None. This may not be a problem right now, but it may very well occur after a combination of applying this optimization, then inlining the function the optimization was applied to, and then applying the optimization on the outer function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is resolved via the IsReturnPlaceUsed visitor. It also will fire for assignments of projections of _0, although that's probably not obvious on first glance. I kinda want to change PlaceContext to remain unchanged instead of falling back to Projection when we see something like _1.field. I'd want to introduce a PlaceContext::Deref though.

src/librustc_mir/transform/nrvo.rs Show resolved Hide resolved
@jonas-schievink
Copy link
Contributor

@ecstatic-morse Oh, just remembered: What I measured for my PR above is not quite the same as what you measured here: I only ran the NRVO pass on rustc, not on any crates compiled by it, but you always run it. This might explain the performance difference. Your pass is cheap to run and presumably removes enough MIR to speed up further operations (codegen, const eval, (de)serialization). It might not speed up rustc in general though.

@ecstatic-morse
Copy link
Contributor Author

@jonas-schievink Do we run MIR optimizations for check builds?

@jonas-schievink
Copy link
Contributor

At least for const eval, we do, yes

@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented May 15, 2020

That could partially explain it, but I'm not convinced that rustc isn't faster. For most of the benchmarks, it's not just the queries that run post-optimization that have sped up but things like mir_borrowck and typeck_tables_of. I will do a more robust perf run that disables the optimization while running the benchmarks. How did you do this in #71003?

@jonas-schievink
Copy link
Contributor

Oh it's really gross:

        if !tcx.crate_name(LOCAL_CRATE).as_str().starts_with("rustc_") {
            // Only run this pass on the compiler.
            return;
        }

@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented May 15, 2020

Don't we miss the standard library with that?

@jonas-schievink
Copy link
Contributor

jonas-schievink commented May 15, 2020

Yeah. I left it out because I didn't think NRVO would do too much on it. Do you think it will benefit from NRVO a lot?

@bors
Copy link
Contributor

bors commented May 21, 2020

⌛ Testing commit 856cd66 with merge f52a10e5837d9b7ae995e1548ff729a31c595f51...

@bors
Copy link
Contributor

bors commented May 21, 2020

💔 Test failed - checks-azure

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels May 21, 2020
@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented May 21, 2020

Hmm. Apparently I didn't get to rustc_ast_lowering when testing a local stage 1 build. It seems the return place is sometimes assigned a local with a different type, in this case &T instead of &mut T. I've updated the code to skip the optimization instead of panicking when this occurs. I'm going to tentatively reapprove this, but we should figure out what to do here long-term.

@bors r=oli-obk

@bors
Copy link
Contributor

bors commented May 21, 2020

📌 Commit f509862 has been approved by oli-obk

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 21, 2020
@bors
Copy link
Contributor

bors commented May 21, 2020

💡 This pull request was already approved, no need to approve it again.

@bors
Copy link
Contributor

bors commented May 21, 2020

📌 Commit f509862 has been approved by oli-obk

@RalfJung
Copy link
Member

@ecstatic-morse assignment of &mut T values to &T places can happen after optimizations turned &*x into x. Also see this function in Miri.

@bors
Copy link
Contributor

bors commented May 21, 2020

⌛ Testing commit f509862 with merge 7f79e98...

@bors
Copy link
Contributor

bors commented May 21, 2020

☀️ Test successful - checks-azure
Approved by: oli-obk
Pushing 7f79e98 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label May 21, 2020
@bors bors merged commit 7f79e98 into rust-lang:master May 21, 2020
@bluss bluss added the relnotes-perf Performance improvements that should be mentioned in the release notes label May 21, 2020
@ecstatic-morse ecstatic-morse deleted the nrvo branch May 21, 2020 17:12
@nnethercote
Copy link
Contributor

The final perf win from the landing. Nice work!

ptersilie added a commit to ptersilie/yk that referenced this pull request May 28, 2020
This change replaces the default return variable `$0` with the variable
in the outer context where the return value will end up after leaving
the function. This saves us an instruction when we compile the trace.
More importantly however, this guards us against a future optimisation
in rustc that allows SIR to assign to $0 multiple times and at the
beginning of a block, which could lead to another function overwriting
its value (see rust-lang/rust#72205).
ptersilie added a commit to ptersilie/yk that referenced this pull request Jun 1, 2020
This change replaces the default return variable `$0` with the variable
in the outer context where the return value will end up after leaving
the function. This saves us an instruction when we compile the trace.
More importantly however, this guards us against a future optimisation
in rustc that allows SIR to assign to $0 multiple times and at the
beginning of a block, which could lead to another function overwriting
its value (see rust-lang/rust#72205).
bors bot added a commit to ykjit/yk that referenced this pull request Jun 1, 2020
67: Replace return variable with its destination. r=vext01 a=ptersilie

This change replaces the default return variable `$0` with the variable
in the outer context where the return value will end up after leaving
the function. This saves us an instruction when we compile the trace.
More importantly however, this guards us against a future optimisation
in rustc that allows SIR to assign to $0 multiple times and at the
beginning of a block, which could lead to another function overwriting
its value (see rust-lang/rust#72205).

While we are here, also fix a bug in the variable renaming code.
Currently, functions that are called consecutively (i.e. they are not                                                                                                            
nested) use the same offset to rename their variables (the offset of                                                                                                             
their outer context), which leads to them sharing the same variable                                                                                                              
names. By using an accumulator which is continuously increased to                                                                                                                
calculate the offset, we make sure consecutive functions have increasing                                                                                                         
variable names, even when after leaving a function the offset is                                                                                                                 
temporarily reset to that of the outer context.

Co-authored-by: Lukas Diekmann <lukas.diekmann@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. relnotes-perf Performance improvements that should be mentioned in the release notes S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.