Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for re-using monomorphizations in upstream crates. #48779

Merged
merged 16 commits into from
Apr 6, 2018

Conversation

michaelwoerister
Copy link
Member

@michaelwoerister michaelwoerister commented Mar 6, 2018

Followup to #48611. This implementation is pretty much finished modulo failing tests if there are any. Not quite ready for review yet though.

DESCRIPTION

This PR introduces a share-generics mode for RLIBs and Rust dylibs. When a crate is compiled in this mode, two things will happen:

  • before instantiating a monomorphization in the current crate, the compiler will look for that monomorphization in all upstream crates and link to it, if possible.
  • monomorphizations are not internalized during partitioning. Instead they are added to the list of symbols exported from the crate.

This results in less code being translated and LLVMed. However, there are also downsides:

  • it will impede optimization somewhat, since fewer functions can be internalized, and
  • Rust dylibs will have bigger symbol tables since they'll also export monomorphizations.

Consequently, this PR only enables the shared-generics mode for opt-levels No, Less, Size, and MinSize, and for when incremental compilation is activated. -O2 and -O3 will still generate generic functions per-crate.

Another thing to note is that this has a somewhat similar effect as MIR-only RLIBs, in that monomorphizations are shared, but it is less effective because it cannot share monomorphizations between sibling crates:

         A        <--- defines `fn foo<T>() { .. }`
       /   \
      /     \
     B       C    <--- both call `foo<u32>()`
      \     /
       \   /
         D        <--- calls `foo<u32>()` too

With share-generics, both B and C have to instantiate foo<u32> and only D can re-use it (from either B or C). With MIR-only RLIBs, B and C would not instantiate anything, and in D we would then only instantiate foo<u32> once.
On the other hand, when there are many leaf crates in the graph (e.g. when compiling many individual test binaries) then the share-generics approach will often be more effective.

TODO

  • Add codegen test that makes sure monomorphizations can be internalized in non-Rust binaries.
  • Add codegen-units test that makes sure we share generics.
  • Add run-make test that makes sure we don't export any monomorphizations from non-Rust binaries.
  • Review for reproducible-builds implications.

@rust-highfive
Copy link
Collaborator

r? @estebank

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 6, 2018
@michaelwoerister
Copy link
Member Author

@bors try

@bors
Copy link
Contributor

bors commented Mar 6, 2018

⌛ Trying commit 35c9b1f2f0a365187789fc684cdb5eba9afb81dd with merge 55221f5e2d6d5c71f6b89674eb29f2a213f415ef...

@bors
Copy link
Contributor

bors commented Mar 6, 2018

☀️ Test successful - status-travis
State: approved= try=True

@michaelwoerister
Copy link
Member Author

@Mark-Simulacrum, could you do a perf run for this too, please?

@Mark-Simulacrum
Copy link
Member

Perf queued. Probably about 40-45 minutes until it starts.

@michaelwoerister
Copy link
Member Author

@michaelwoerister
Copy link
Member Author

@Mark-Simulacrum, the results don't seem to be available yet. Is it still in the queue or has something gone wrong?

@Mark-Simulacrum
Copy link
Member

Hm, it does look like something went wrong -- I've restarted the build.

@michaelwoerister
Copy link
Member Author

Will the link be the same?

@Mark-Simulacrum
Copy link
Member

Hm, it failed again -- I'm going to try and keep an eye on it and hopefully diagnose why, it also turns out we weren't properly logging the failures for try builds previously so I've now corrected that as well.

@Mark-Simulacrum
Copy link
Member

URL works now!

@michaelwoerister
Copy link
Member Author

Thanks, Mark!

@michaelwoerister
Copy link
Member Author

OK, so those numbers look good. I had hoped that they would be even better though. It looks like it's mostly small functions that get re-used. But yeah, -15.9% for tokio-web-push, I'll take it :)

@michaelwoerister
Copy link
Member Author

@rust-lang/compiler & @alexcrichton, do you have any objections to pursuing this further? There's a description at the top and performance numbers are here: http://perf.rust-lang.org/compare.html?start=6f2100b92cb14fbea2102701af6a3ac5814bd06c&end=55221f5e2d6d5c71f6b89674eb29f2a213f415ef&stat=instructions%3Au

@alexcrichton
Copy link
Member

Awesome work here @michaelwoerister! It's pretty neat how it's not to difficult to play around with various schemes like this these days :)

One concern I might have here is the size of binaries but given that this only affects debug mode rather than optimized then I guess it doesn't matter too much? We rely on -ffunction-sections to pretty aggressively prune dead code but if the symbols are all exported then I think even for binaries they'll stick around?

In general though seems like a great idea to me to keep pursuing, any bugs or surprises along the way we can probably smooth over!

For the diamond problem you gisted above, is this what all that "link once ODR" stuff is for in LLVM? I feel like that's all basically intended for optimized binaries linking only one copy rather than for debug mode, so it may not benefit us much if we don't turn this on in optimized mode. Speaking of optimized mode though, we may actually be able to get some nice wins here with available_externally in LLVM. That'd allow optimized mode to inline all the upstream copies but we get to completely skip codegen for them. Perhaps something to pursue down the road!

@michaelwoerister
Copy link
Member Author

We rely on -ffunction-sections to pretty aggressively prune dead code but if the symbols are all exported then I think even for binaries they'll stick around?

The monomorphizations are still assigned SymbolExportLevel::Rust, so for executables, cdylibs, and staticlibs, the linker script should hide them. I'll better add a test for that though.

We could look into LinkOnceODR but I remember it causing problems for MingGW. It would only reduce the size of linked artifacts, not compile times though, I think, as we'd still have to translate and optimize twice in the example above. Otoh, it might make symbol naming a bit less complicated.

@michaelwoerister
Copy link
Member Author

Also, thanks for the feedback, @alexcrichton! :)

@alexcrichton
Copy link
Member

Oh right that's true, I'd sort of doubt that LinkOnceODR is portable enough for us to effectively leverage it...

I think that for executables we don't currently pass linker scripts/symbol whitelists, but AFAIK that's because we just never have before. We could likely start now!

@michaelwoerister
Copy link
Member Author

I think that for executables we don't currently pass linker scripts/symbol whitelists

OK, I'll make sure we do as part of the PR.

@nikomatsakis
Copy link
Contributor

We discussed this in the @rust-lang/compiler meeting today. Everybody felt pretty good about it. It'd be nice to land this and possibly do further experimentation to see if we can enable in optimized builds without hurting perf.

@michaelwoerister michaelwoerister force-pushed the share-generics4 branch 2 times, most recently from be1e8f6 to d4264dc Compare March 13, 2018 13:50
@michaelwoerister michaelwoerister changed the title WIP: Allow for re-using monomorphizations in upstream crates. Allow for re-using monomorphizations in upstream crates. Mar 13, 2018
@michaelwoerister
Copy link
Member Author

@bors r=alexcrichton

@bors
Copy link
Contributor

bors commented Apr 6, 2018

📌 Commit 61991a5 has been approved by alexcrichton

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 6, 2018
@bors
Copy link
Contributor

bors commented Apr 6, 2018

⌛ Testing commit 61991a5 with merge 7678d50...

bors added a commit that referenced this pull request Apr 6, 2018
Allow for re-using monomorphizations in upstream crates.

Followup to #48611. This implementation is pretty much finished modulo failing tests if there are any. Not quite ready for review yet though.

### DESCRIPTION

This PR introduces a `share-generics` mode for RLIBs and Rust dylibs. When a crate is compiled in this mode, two things will happen:
- before instantiating a monomorphization in the current crate, the compiler will look for that monomorphization in all upstream crates and link to it, if possible.
- monomorphizations are not internalized during partitioning. Instead they are added to the list of symbols exported from the crate.

This results in less code being translated and LLVMed. However, there are also downsides:
- it will impede optimization somewhat, since fewer functions can be internalized, and
- Rust dylibs will have bigger symbol tables since they'll also export monomorphizations.

Consequently, this PR only enables the `shared-generics` mode for opt-levels `No`, `Less`, `Size`, and `MinSize`, and for when incremental compilation is activated. `-O2` and `-O3` will still generate generic functions per-crate.

Another thing to note is that this has a somewhat similar effect as MIR-only RLIBs, in that monomorphizations are shared, but it is less effective because it cannot share monomorphizations between sibling crates:

```
         A        <--- defines `fn foo<T>() { .. }`
       /   \
      /     \
     B       C    <--- both call `foo<u32>()`
      \     /
       \   /
         D        <--- calls `foo<u32>()` too
```

With `share-generics`, both `B` and `C` have to instantiate `foo<u32>` and only `D` can re-use it (from either `B` or `C`). With MIR-only RLIBs, `B` and `C` would not instantiate anything, and in `D` we would then only instantiate `foo<u32>` once.
On the other hand, when there are many leaf crates in the graph (e.g. when compiling many individual test binaries) then the `share-generics` approach will often be more effective.

### TODO
 - [x] Add codegen test that makes sure monomorphizations can be internalized in non-Rust binaries.
 - [x] Add codegen-units test that makes sure we share generics.
 - [x] Add run-make test that makes sure we don't export any monomorphizations from non-Rust binaries.
 - [x] Review for reproducible-builds implications.
@bors
Copy link
Contributor

bors commented Apr 6, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 7678d50 to master...

@@ -531,3 +530,9 @@ impl_stable_hash_for!(struct GeneratorData<'tcx> { layout });
// Tags used for encoding Spans:
pub const TAG_VALID_SPAN: u8 = 0;
pub const TAG_INVALID_SPAN: u8 = 1;

#[derive(RustcEncodable, RustcDecodable)]
pub struct EncodedExportedSymbols {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have a comment on it, that it's used to avoid adding a 'tcx parameter to CrateRoot (which I'm not even sure is a problem, if covariant, we'd just store it as 'static).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.