Inlined symbols #74554

nnethercote · 2020-07-20T12:32:48Z

The idea here is to encode symbols that are 4 bytes or shorter directly in the u32, and only use the hash table for longer symbols. Avoiding the hash table accesses should speed things up.

r? @ghost

nnethercote · 2020-07-20T12:33:24Z

I was getting some slightly odd measurements for local runs, so lets see what CI says.

@bors try @rust-timer queue

rust-timer · 2020-07-20T12:33:25Z

Awaiting bors try build completion

bors · 2020-07-20T12:33:36Z

⌛ Trying commit f1c885b0d6ee42b0a966bcf7702f04a062751f75 with merge 0d5bd95f53618d3e3f1de22edfc7a99bc144ccff...

tesuji · 2020-07-20T14:23:34Z

src/librustc_span/symbol.rs

+        let n = if len == 4 && s[3] != 0 && s[3] < 0x80 {
+            s[0] as u32 | ((s[1] as u32) << 8) | ((s[2] as u32) << 16) | ((s[3] as u32) << 24)
+        } else if len == 3 && s[2] != 0 {
+            s[0] as u32 | ((s[1] as u32) << 8) | ((s[2] as u32) << 16)
+        } else if len == 2 && s[1] != 0 {
+            s[0] as u32 | ((s[1] as u32) << 8)
+        } else if len == 1 && s[0] != 0 {
+            s[0] as u32
+        } else if len == 0 {
+            0u32
+        } else {
+            return None;
+        };


Rewrite to a match is nicer.

bors · 2020-07-20T14:42:25Z

☀️ Try build successful - checks-actions, checks-azure
Build commit: 0d5bd95f53618d3e3f1de22edfc7a99bc144ccff (0d5bd95f53618d3e3f1de22edfc7a99bc144ccff)

rust-timer · 2020-07-20T14:42:27Z

Queued 0d5bd95f53618d3e3f1de22edfc7a99bc144ccff with parent 05630b0, future comparison URL.

alex · 2020-07-20T17:13:52Z

With inlined storage, does it potentially make sense to try making this value a u64? (As a separate PR, naturally :-)) Presumably you get better inline rates, at the cost of more storage space.

rust-timer · 2020-07-20T18:00:16Z

Finished benchmarking try commit (0d5bd95f53618d3e3f1de22edfc7a99bc144ccff): comparison url.

nnethercote · 2020-07-20T20:48:36Z

With inlined storage, does it potentially make sense to try making this value a u64? (As a separate PR, naturally :-)) Presumably you get better inline rates, at the cost of more storage space.

Going from u32 to u64 increases the proportion of inlined symbols from ~50% to ~75%. But it also increases the size of some important types such as Token. I suspect it won't be a win, though I would have to try to be sure. I may do that after this PR lands, if it does. (Note also that changing to u64 isn't just a matter of changing a single constant; additional lines of code need to be written in a few places to handle the 5,6,7,8 cases.)

nnethercote · 2020-07-20T21:07:36Z

The performance results look decent but the biggest improvements all accrue to the shortest-running benchmarks, which are mostly artificial. By the time we get to real programs, such as futures, the improvements are ~0.5% or less.

@rust-lang/wg-compiler-performance: any thoughts about the whether this perf improvement is worth the additional complexity? (The big new comment at the top of src/librustc_span/symbol.rs explains the change.)

Mark-Simulacrum · 2020-07-20T21:10:26Z

Another possibility is to try packing more into the u32. If we constrain ourselves to something like [a-z_] (and we can throw in 6 more common characters, we'd need to look at what they are), we can pack into 5 bits -- that would mean that strings of length 6 rather than 4 would be able to fit into the u32 (with 2 bits left over for additional metadata).

That's a much more complex encoding, and playing with alternatives (e.g., including upper case letters and using 6 bits still lets us pack 5 characters) may be interesting too.

I suspect though that any scheme like this is unlikely to merit significant performance wins; it's only really useful during parsing when we're actively interning lots of strings, right? In that case it might be worth trying to eliminate the lock (or take it and stash the guard into e.g. ParseSess or something, potentially).

Do you have a table of common length strings? Would it be worthwhile to instead of coming up with encoding schemes like this to avoid hashing for some really common things, by hard-coding them? e.g., I could imagine std is super common and would make sense to avoid hashing by paying an up-front comparison for all strings with.

nnethercote · 2020-07-20T21:31:10Z

The most common chars are [A-Za-z0-9_], which is 63 chars, i.e. 6 bits. So at best we could inline 5 chars, but I suspect the extra complication wouldn't be worth it -- the bit packing and extraction would be a lot more complex.

As for common symbols, we have the static symbol list at the top of src/librustc_span/symbol.rs which contains hundreds of common pre-interned symbols. I've already put in a lot of effort to use these symbols directly to avoid lots of interning/deinterning. There's very little more blood to squeeze from that stone.

Stashing the lock is difficult. It's extremely easy to call Symbol::intern() or Symbol::as_str() while you have it locked, and then you hit a run-time abort. (I recently tried doing exactly this on a much smaller piece of code than the parser, and hit that exact problem.) It's not feasible except for very simple functions where you can see exactly what code is run while the lock is held.

Mark-Simulacrum · 2020-07-20T22:31:23Z

As for common symbols, we have the static symbol list at the top of src/librustc_span/symbol.rs which contains hundreds of common pre-interned symbols. I've already put in a lot of effort to use these symbols directly to avoid lots of interning/deinterning. There's very little more blood to squeeze from that stone.

Yes, I meant that when we run into these symbols in code (i.e., during parsing), we're currently hashing them to figure out the index to put into the Symbol, rather than e.g. directly comparing against a (much smaller) subset. Maybe there's no wins to be had here -- though it would entirely bypass the lock, it'd be a fairly high constant cost.

Stashing the lock is difficult. It's extremely easy to call Symbol::intern() or Symbol::as_str() while you have it locked, and then you hit a run-time abort. (I recently tried doing exactly this on a much smaller piece of code than the parser, and hit that exact problem.) It's not feasible except for very simple functions where you can see exactly what code is run while the lock is held.

Yeah, I thought this would be the case. I'm actually fairly surprised that taking the lock is expensive -- I'd expect it to be much cheaper than it seems to be from what you've said (and we've seen when removing it). In today's rustc, it's essentially just incrementing/decrementing a single integer.

I'm personally feeling like we could land this but I'm pretty ambivalent. It is a fairly significant increase in complexity, I think, for not too much in the way of gains.

nnethercote · 2020-07-20T22:38:07Z

Accessing the table requires taking a lock (really just a RefCell borrow in a non-parallel rustc) and doing a TLS lookup. Cachegrind diffs indicate that it's the TLS lookup that's the expensive part. I should make that clearer in the comments.

nnethercote · 2020-07-20T22:39:49Z

One argument in favour of landing is that the complexity is well-contained. Users of Symbol and Interner don't need to know about the encoding.

Mark-Simulacrum · 2020-07-20T22:49:39Z

Thinking some more, it feels like we should land this, but make sure to try and do so with an implementation that is somewhat abstract in the sense that I'd like it to be not too hard to experiment with other inline storage methods (e.g., my ideas).

I'm happy to review that work.

petrochenkov · 2020-07-26T15:15:15Z

I wish we didn't have to do this, it's just too damn ugly.
Yes, Span uses the same technique, but it at least doesn't have to provide functionality like as_str or is_*_keyword.

petrochenkov · 2020-07-26T15:20:17Z

If the majority of overhead comes from TLS lookup, then providing a direct access to the string interner (and other globals) through ParseSess (or some other session type) would be preferable long term (cc #74079 (comment)).

petrochenkov · 2020-07-26T15:24:18Z

cc @rust-lang/compiler on evaluating the performance vs complexity tradeoff.
The perf results are in #74554 (comment), the improvements mostly appear on small synthetic programs.

oli-obk · 2020-07-26T15:41:35Z

Not sure if this idea is nonsense: We could have a separate (non-tls) table for the pre-interned symbols and look up the string of all the pre-interned symbols in a different table. This would require an additional comparison operation per lookup and only give us a potential speed up for preinterned symbols, but it may give us the same speedups as this PR since most of the short symbols we're looking at are builtin ones I'd think?

nnethercote · 2020-07-27T02:36:32Z

most of the short symbols we're looking at are builtin ones I'd think?

I strongly recommend doing some profiling to confirm this kind of assumption :)

petrochenkov · 2020-08-04T20:48:15Z

Zulip discussion from the previous meeting - https://rust-lang.zulipchat.com/#narrow/stream/238009-t-compiler.2Fmeetings/topic/.5Bweekly.20meeting.5D.202020-07-30.20.2354818/near/205485071.
Looks like there's some support for landing this.

petrochenkov · 2020-08-08T22:07:47Z

@nnethercote
I've submitted #75309 to generate is_used_keyword and similar classification functions automatically to avoid what 4896614 is doing.

In the meantime could you move all the non_ascii_idents stuff to a separate PR?

nnethercote · 2020-08-09T02:05:28Z

@petrochenkov: thank you for doing #75309. I had though that a proc macro could probably improve the keyword categorization, but I didn't have the gumption to do it myself.

I am on PTO for the next two weeks so I won't get to this until after that. I'm still ambivalent about this, particularly because of my uncertainty in #74554 (comment). If I had to choose between eliminating SymbolStr or getting the small speedup from inlined symbols, I'd probably choose the former...

bors · 2020-08-30T17:47:05Z

☔ The latest upstream changes (presumably #74862) made this pull request unmergeable. Please resolve the merge conflicts.

The check in rustdoc using it is artificial and not helpful.

nnethercote · 2020-08-31T02:30:26Z

@petrochenkov: I have incorporated your commits from #75309.

@bors try @rust-timer queue

rust-timer · 2020-08-31T02:30:27Z

Awaiting bors try build completion

bors · 2020-08-31T02:30:41Z

⌛ Trying commit 23888ae with merge 946d2e6f55e986fc0427f659bb269031b255369e...

bors · 2020-08-31T03:14:01Z

☀️ Try build successful - checks-actions, checks-azure
Build commit: 946d2e6f55e986fc0427f659bb269031b255369e (946d2e6f55e986fc0427f659bb269031b255369e)

rust-timer · 2020-08-31T03:14:03Z

Queued 946d2e6f55e986fc0427f659bb269031b255369e with parent 022e1fe, future comparison URL.

rust-timer · 2020-08-31T04:30:02Z

Finished benchmarking try commit (946d2e6f55e986fc0427f659bb269031b255369e): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

nnethercote · 2020-08-31T05:39:50Z

The latest perf results are much worse -- small wins on doc builds, little change elsewhere. Not sure why. As it stands, definitely not worth the extra complexity.

petrochenkov · 2020-08-31T07:22:42Z

@nnethercote

The latest perf results are much worse -- small wins on doc builds, little change elsewhere. Not sure why.

Perhaps due to #75813 "Lazy decoding of DefPathTable from crate metadata (non-incremental case)"?

The previous result (#74554 (comment)) showed improvements mostly for small crates, it means that they could be related to decoding symbols from metadata, but #75813 skips that metadata decoding entirely.

nnethercote · 2020-08-31T07:39:53Z

Sounds plausible. I think we can close this.

Remove `SymbolStr` This was originally proposed in rust-lang#74554 (comment). As well as removing the icky `SymbolStr` type, it allows the removal of a lot of `&` and `*` occurrences. Best reviewed one commit at a time. r? `@oli-obk`

Remove `SymbolStr` This was originally proposed in rust-lang/rust#74554 (comment). As well as removing the icky `SymbolStr` type, it allows the removal of a lot of `&` and `*` occurrences. Best reviewed one commit at a time. r? `@oli-obk`

tesuji reviewed Jul 20, 2020

View reviewed changes

petrochenkov self-assigned this Jul 21, 2020

petrochenkov added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 22, 2020

petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Aug 4, 2020

petrochenkov added S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 8, 2020

nnethercote mentioned this pull request Aug 9, 2020

Tweak confusable idents checking #75349

Merged

nnethercote force-pushed the inlined-symbols branch from 2adb229 to 6d3c11a Compare August 24, 2020 07:16

petrochenkov and others added 3 commits August 31, 2020 11:46

rustc_span: Remove Symbol::is_doc_keyword.

b9b11a0

The check in rustdoc using it is artificial and not helpful.

rustc_span: Generate keyword classification functions automatically.

3b1aa36

Store short symbols in the Symbol itself.

23888ae

nnethercote force-pushed the inlined-symbols branch from 6d3c11a to 23888ae Compare August 31, 2020 02:03

nnethercote closed this Aug 31, 2020

panstromek mentioned this pull request Nov 25, 2020

Split symbol interner into static unsynchronized and dynamic synchronized parts #79425

Closed

nnethercote mentioned this pull request Dec 15, 2021

Remove SymbolStr #91957

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inlined symbols #74554

Inlined symbols #74554

nnethercote commented Jul 20, 2020

nnethercote commented Jul 20, 2020

rust-timer commented Jul 20, 2020

bors commented Jul 20, 2020

tesuji Jul 20, 2020

bors commented Jul 20, 2020

rust-timer commented Jul 20, 2020

alex commented Jul 20, 2020

rust-timer commented Jul 20, 2020

nnethercote commented Jul 20, 2020

nnethercote commented Jul 20, 2020

Mark-Simulacrum commented Jul 20, 2020

nnethercote commented Jul 20, 2020

Mark-Simulacrum commented Jul 20, 2020

nnethercote commented Jul 20, 2020

nnethercote commented Jul 20, 2020

Mark-Simulacrum commented Jul 20, 2020

petrochenkov commented Jul 26, 2020

petrochenkov commented Jul 26, 2020

petrochenkov commented Jul 26, 2020

oli-obk commented Jul 26, 2020

nnethercote commented Jul 27, 2020

petrochenkov commented Aug 4, 2020

petrochenkov commented Aug 8, 2020

nnethercote commented Aug 9, 2020

bors commented Aug 30, 2020

nnethercote commented Aug 31, 2020

rust-timer commented Aug 31, 2020

bors commented Aug 31, 2020

bors commented Aug 31, 2020

rust-timer commented Aug 31, 2020

rust-timer commented Aug 31, 2020

nnethercote commented Aug 31, 2020

petrochenkov commented Aug 31, 2020

nnethercote commented Aug 31, 2020

Inlined symbols #74554

Inlined symbols #74554

Conversation

nnethercote commented Jul 20, 2020

nnethercote commented Jul 20, 2020

rust-timer commented Jul 20, 2020

bors commented Jul 20, 2020

tesuji Jul 20, 2020

Choose a reason for hiding this comment

bors commented Jul 20, 2020

rust-timer commented Jul 20, 2020

alex commented Jul 20, 2020

rust-timer commented Jul 20, 2020

nnethercote commented Jul 20, 2020

nnethercote commented Jul 20, 2020

Mark-Simulacrum commented Jul 20, 2020

nnethercote commented Jul 20, 2020

Mark-Simulacrum commented Jul 20, 2020

nnethercote commented Jul 20, 2020

nnethercote commented Jul 20, 2020

Mark-Simulacrum commented Jul 20, 2020

petrochenkov commented Jul 26, 2020

petrochenkov commented Jul 26, 2020

petrochenkov commented Jul 26, 2020

oli-obk commented Jul 26, 2020

nnethercote commented Jul 27, 2020

petrochenkov commented Aug 4, 2020

petrochenkov commented Aug 8, 2020

nnethercote commented Aug 9, 2020

bors commented Aug 30, 2020

nnethercote commented Aug 31, 2020

rust-timer commented Aug 31, 2020

bors commented Aug 31, 2020

bors commented Aug 31, 2020

rust-timer commented Aug 31, 2020

rust-timer commented Aug 31, 2020

nnethercote commented Aug 31, 2020

petrochenkov commented Aug 31, 2020

nnethercote commented Aug 31, 2020