Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inlined symbols #74554

Closed
wants to merge 3 commits into from
Closed

Conversation

nnethercote
Copy link
Contributor

The idea here is to encode symbols that are 4 bytes or shorter directly in the u32, and only use the hash table for longer symbols. Avoiding the hash table accesses should speed things up.

r? @ghost

@nnethercote
Copy link
Contributor Author

I was getting some slightly odd measurements for local runs, so lets see what CI says.

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Jul 20, 2020

⌛ Trying commit f1c885b0d6ee42b0a966bcf7702f04a062751f75 with merge 0d5bd95f53618d3e3f1de22edfc7a99bc144ccff...

Comment on lines 1466 to 1488
let n = if len == 4 && s[3] != 0 && s[3] < 0x80 {
s[0] as u32 | ((s[1] as u32) << 8) | ((s[2] as u32) << 16) | ((s[3] as u32) << 24)
} else if len == 3 && s[2] != 0 {
s[0] as u32 | ((s[1] as u32) << 8) | ((s[2] as u32) << 16)
} else if len == 2 && s[1] != 0 {
s[0] as u32 | ((s[1] as u32) << 8)
} else if len == 1 && s[0] != 0 {
s[0] as u32
} else if len == 0 {
0u32
} else {
return None;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrite to a match is nicer.

@bors
Copy link
Contributor

bors commented Jul 20, 2020

☀️ Try build successful - checks-actions, checks-azure
Build commit: 0d5bd95f53618d3e3f1de22edfc7a99bc144ccff (0d5bd95f53618d3e3f1de22edfc7a99bc144ccff)

@rust-timer
Copy link
Collaborator

Queued 0d5bd95f53618d3e3f1de22edfc7a99bc144ccff with parent 05630b0, future comparison URL.

@alex
Copy link
Member

alex commented Jul 20, 2020

With inlined storage, does it potentially make sense to try making this value a u64? (As a separate PR, naturally :-)) Presumably you get better inline rates, at the cost of more storage space.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (0d5bd95f53618d3e3f1de22edfc7a99bc144ccff): comparison url.

@nnethercote
Copy link
Contributor Author

With inlined storage, does it potentially make sense to try making this value a u64? (As a separate PR, naturally :-)) Presumably you get better inline rates, at the cost of more storage space.

Going from u32 to u64 increases the proportion of inlined symbols from ~50% to ~75%. But it also increases the size of some important types such as Token. I suspect it won't be a win, though I would have to try to be sure. I may do that after this PR lands, if it does. (Note also that changing to u64 isn't just a matter of changing a single constant; additional lines of code need to be written in a few places to handle the 5,6,7,8 cases.)

@nnethercote
Copy link
Contributor Author

The performance results look decent but the biggest improvements all accrue to the shortest-running benchmarks, which are mostly artificial. By the time we get to real programs, such as futures, the improvements are ~0.5% or less.

@rust-lang/wg-compiler-performance: any thoughts about the whether this perf improvement is worth the additional complexity? (The big new comment at the top of src/librustc_span/symbol.rs explains the change.)

@Mark-Simulacrum
Copy link
Member

Another possibility is to try packing more into the u32. If we constrain ourselves to something like [a-z_] (and we can throw in 6 more common characters, we'd need to look at what they are), we can pack into 5 bits -- that would mean that strings of length 6 rather than 4 would be able to fit into the u32 (with 2 bits left over for additional metadata).

That's a much more complex encoding, and playing with alternatives (e.g., including upper case letters and using 6 bits still lets us pack 5 characters) may be interesting too.

I suspect though that any scheme like this is unlikely to merit significant performance wins; it's only really useful during parsing when we're actively interning lots of strings, right? In that case it might be worth trying to eliminate the lock (or take it and stash the guard into e.g. ParseSess or something, potentially).

Do you have a table of common length strings? Would it be worthwhile to instead of coming up with encoding schemes like this to avoid hashing for some really common things, by hard-coding them? e.g., I could imagine std is super common and would make sense to avoid hashing by paying an up-front comparison for all strings with.

@nnethercote
Copy link
Contributor Author

The most common chars are [A-Za-z0-9_], which is 63 chars, i.e. 6 bits. So at best we could inline 5 chars, but I suspect the extra complication wouldn't be worth it -- the bit packing and extraction would be a lot more complex.

As for common symbols, we have the static symbol list at the top of src/librustc_span/symbol.rs which contains hundreds of common pre-interned symbols. I've already put in a lot of effort to use these symbols directly to avoid lots of interning/deinterning. There's very little more blood to squeeze from that stone.

Stashing the lock is difficult. It's extremely easy to call Symbol::intern() or Symbol::as_str() while you have it locked, and then you hit a run-time abort. (I recently tried doing exactly this on a much smaller piece of code than the parser, and hit that exact problem.) It's not feasible except for very simple functions where you can see exactly what code is run while the lock is held.

@Mark-Simulacrum
Copy link
Member

As for common symbols, we have the static symbol list at the top of src/librustc_span/symbol.rs which contains hundreds of common pre-interned symbols. I've already put in a lot of effort to use these symbols directly to avoid lots of interning/deinterning. There's very little more blood to squeeze from that stone.

Yes, I meant that when we run into these symbols in code (i.e., during parsing), we're currently hashing them to figure out the index to put into the Symbol, rather than e.g. directly comparing against a (much smaller) subset. Maybe there's no wins to be had here -- though it would entirely bypass the lock, it'd be a fairly high constant cost.

Stashing the lock is difficult. It's extremely easy to call Symbol::intern() or Symbol::as_str() while you have it locked, and then you hit a run-time abort. (I recently tried doing exactly this on a much smaller piece of code than the parser, and hit that exact problem.) It's not feasible except for very simple functions where you can see exactly what code is run while the lock is held.

Yeah, I thought this would be the case. I'm actually fairly surprised that taking the lock is expensive -- I'd expect it to be much cheaper than it seems to be from what you've said (and we've seen when removing it). In today's rustc, it's essentially just incrementing/decrementing a single integer.


I'm personally feeling like we could land this but I'm pretty ambivalent. It is a fairly significant increase in complexity, I think, for not too much in the way of gains.

@nnethercote
Copy link
Contributor Author

Accessing the table requires taking a lock (really just a RefCell borrow in a non-parallel rustc) and doing a TLS lookup. Cachegrind diffs indicate that it's the TLS lookup that's the expensive part. I should make that clearer in the comments.

@nnethercote
Copy link
Contributor Author

One argument in favour of landing is that the complexity is well-contained. Users of Symbol and Interner don't need to know about the encoding.

@Mark-Simulacrum
Copy link
Member

Thinking some more, it feels like we should land this, but make sure to try and do so with an implementation that is somewhat abstract in the sense that I'd like it to be not too hard to experiment with other inline storage methods (e.g., my ideas).

I'm happy to review that work.

@petrochenkov petrochenkov self-assigned this Jul 21, 2020
@petrochenkov petrochenkov added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 22, 2020
@petrochenkov
Copy link
Contributor

I wish we didn't have to do this, it's just too damn ugly.
Yes, Span uses the same technique, but it at least doesn't have to provide functionality like as_str or is_*_keyword.

@petrochenkov
Copy link
Contributor

If the majority of overhead comes from TLS lookup, then providing a direct access to the string interner (and other globals) through ParseSess (or some other session type) would be preferable long term (cc #74079 (comment)).

@petrochenkov petrochenkov added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 26, 2020
@petrochenkov
Copy link
Contributor

cc @rust-lang/compiler on evaluating the performance vs complexity tradeoff.
The perf results are in #74554 (comment), the improvements mostly appear on small synthetic programs.

@oli-obk
Copy link
Contributor

oli-obk commented Jul 26, 2020

Not sure if this idea is nonsense: We could have a separate (non-tls) table for the pre-interned symbols and look up the string of all the pre-interned symbols in a different table. This would require an additional comparison operation per lookup and only give us a potential speed up for preinterned symbols, but it may give us the same speedups as this PR since most of the short symbols we're looking at are builtin ones I'd think?

@nnethercote
Copy link
Contributor Author

most of the short symbols we're looking at are builtin ones I'd think?

I strongly recommend doing some profiling to confirm this kind of assumption :)

@petrochenkov
Copy link
Contributor

Zulip discussion from the previous meeting - https://rust-lang.zulipchat.com/#narrow/stream/238009-t-compiler.2Fmeetings/topic/.5Bweekly.20meeting.5D.202020-07-30.20.2354818/near/205485071.
Looks like there's some support for landing this.

@petrochenkov petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Aug 4, 2020
@petrochenkov
Copy link
Contributor

@nnethercote
I've submitted #75309 to generate is_used_keyword and similar classification functions automatically to avoid what 4896614 is doing.

In the meantime could you move all the non_ascii_idents stuff to a separate PR?

@petrochenkov petrochenkov added S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 8, 2020
@nnethercote
Copy link
Contributor Author

@petrochenkov: thank you for doing #75309. I had though that a proc macro could probably improve the keyword categorization, but I didn't have the gumption to do it myself.

I am on PTO for the next two weeks so I won't get to this until after that. I'm still ambivalent about this, particularly because of my uncertainty in #74554 (comment). If I had to choose between eliminating SymbolStr or getting the small speedup from inlined symbols, I'd probably choose the former...

@bors
Copy link
Contributor

bors commented Aug 30, 2020

☔ The latest upstream changes (presumably #74862) made this pull request unmergeable. Please resolve the merge conflicts.

@nnethercote
Copy link
Contributor Author

@petrochenkov: I have incorporated your commits from #75309.

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Aug 31, 2020

⌛ Trying commit 23888ae with merge 946d2e6f55e986fc0427f659bb269031b255369e...

@bors
Copy link
Contributor

bors commented Aug 31, 2020

☀️ Try build successful - checks-actions, checks-azure
Build commit: 946d2e6f55e986fc0427f659bb269031b255369e (946d2e6f55e986fc0427f659bb269031b255369e)

@rust-timer
Copy link
Collaborator

Queued 946d2e6f55e986fc0427f659bb269031b255369e with parent 022e1fe, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (946d2e6f55e986fc0427f659bb269031b255369e): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

@nnethercote
Copy link
Contributor Author

The latest perf results are much worse -- small wins on doc builds, little change elsewhere. Not sure why. As it stands, definitely not worth the extra complexity.

@petrochenkov
Copy link
Contributor

@nnethercote

The latest perf results are much worse -- small wins on doc builds, little change elsewhere. Not sure why.

Perhaps due to #75813 "Lazy decoding of DefPathTable from crate metadata (non-incremental case)"?

The previous result (#74554 (comment)) showed improvements mostly for small crates, it means that they could be related to decoding symbols from metadata, but #75813 skips that metadata decoding entirely.

@nnethercote
Copy link
Contributor Author

Sounds plausible. I think we can close this.

@nnethercote nnethercote mentioned this pull request Dec 15, 2021
bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 19, 2021
Remove `SymbolStr`

This was originally proposed in rust-lang#74554 (comment). As well as removing the icky `SymbolStr` type, it allows the removal of a lot of `&` and `*` occurrences.

Best reviewed one commit at a time.

r? `@oli-obk`
bjorn3 pushed a commit to rust-lang/rustc_codegen_cranelift that referenced this pull request Dec 20, 2021
Remove `SymbolStr`

This was originally proposed in rust-lang/rust#74554 (comment). As well as removing the icky `SymbolStr` type, it allows the removal of a lot of `&` and `*` occurrences.

Best reviewed one commit at a time.

r? `@oli-obk`
flip1995 pushed a commit to flip1995/rust-clippy that referenced this pull request Dec 30, 2021
Remove `SymbolStr`

This was originally proposed in rust-lang/rust#74554 (comment). As well as removing the icky `SymbolStr` type, it allows the removal of a lot of `&` and `*` occurrences.

Best reviewed one commit at a time.

r? `@oli-obk`
bjorn3 pushed a commit to bjorn3/rustc_codegen_gcc that referenced this pull request Dec 30, 2021
Remove `SymbolStr`

This was originally proposed in rust-lang/rust#74554 (comment). As well as removing the icky `SymbolStr` type, it allows the removal of a lot of `&` and `*` occurrences.

Best reviewed one commit at a time.

r? `@oli-obk`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants