Micro-optimize the heck out of LEB128 reading and writing. #69050

nnethercote · 2020-02-11T08:10:31Z

This commit makes the following writing improvements:

Removes the unnecessary write_to_vec function.
Reduces the number of conditions per loop from 2 to 1.
Avoids a mask and a shift on the final byte.

And the following reading improvements:

Removes an unnecessary type annotation.
Fixes a dangerous unchecked slice access. Imagine a slice [0x80] --
the current code will read past the end of the slice some number of
bytes. The bounds check at the end will subsequently trigger, unless
something bad (like a crash) happens first. The cost of doing bounds
check in the loop body is negligible.
Avoids a mask on the final byte.

And the following improvements for both reading and writing:

Changes for to loop for the loops, avoiding an unnecessary
condition on each iteration. This also removes the need for
leb128_size.

All of these changes give significant perf wins, up to 5%.

r? @michaelwoerister

This commit makes the following writing improvements: - Removes the unnecessary `write_to_vec` function. - Reduces the number of conditions per loop from 2 to 1. - Avoids a mask and a shift on the final byte. And the following reading improvements: - Removes an unnecessary type annotation. - Fixes a dangerous unchecked slice access. Imagine a slice `[0x80]` -- the current code will read past the end of the slice some number of bytes. The bounds check at the end will subsequently trigger, unless something bad (like a crash) happens first. The cost of doing bounds check in the loop body is negligible. - Avoids a mask on the final byte. And the following improvements for both reading and writing: - Changes `for` to `loop` for the loops, avoiding an unnecessary condition on each iteration. This also removes the need for `leb128_size`. All of these changes give significant perf wins, up to 5%.

nnethercote · 2020-02-11T08:10:43Z

@bors try @rust-timer queue

rust-timer · 2020-02-11T08:10:44Z

Awaiting bors try build completion

bors · 2020-02-11T08:10:55Z

⌛ Trying commit ad7802f with merge d902ca046d0a8cc72dd69a16627fa5da540030f1...

nnethercote · 2020-02-11T08:13:37Z

Local check results:

clap-rs-check
        avg: -2.7%      min: -5.6%      max: -0.0%
ucd-check
        avg: -1.3%      min: -2.8%      max: -0.4%
coercions-check
        avg: -1.0%?     min: -2.2%?     max: -0.0%?
tuple-stress-check
        avg: -0.7%      min: -1.6%      max: -0.0%
wg-grammar-check
        avg: -0.6%      min: -1.6%      max: -0.0%
html5ever-check
        avg: -0.9%      min: -1.4%      max: -0.2%
script-servo-check
        avg: -0.8%      min: -1.1%      max: -0.1%
cranelift-codegen-check
        avg: -0.5%      min: -1.0%      max: -0.1%
unused-warnings-check
        avg: -0.4%      min: -1.0%      max: -0.0%
webrender-check
        avg: -0.6%      min: -1.0%      max: -0.1%
regression-31157-check
        avg: -0.6%      min: -1.0%      max: -0.2%
regex-check
        avg: -0.7%      min: -1.0%      max: -0.1%
piston-image-check
        avg: -0.6%      min: -0.9%      max: -0.1%
cargo-check
        avg: -0.5%      min: -0.9%      max: -0.0%
webrender-wrench-check
        avg: -0.6%      min: -0.8%      max: -0.1%
hyper-2-check
        avg: -0.4%      min: -0.8%      max: -0.1%
keccak-check
        avg: -0.3%      min: -0.8%      max: -0.0%
futures-check
        avg: -0.5%      min: -0.8%      max: -0.1%
syn-check
        avg: -0.5%      min: -0.8%      max: -0.1%
packed-simd-check
        avg: -0.4%      min: -0.8%      max: -0.0%
ripgrep-check
        avg: -0.5%      min: -0.8%      max: -0.1%
serde-check
        avg: -0.3%      min: -0.8%      max: -0.0%
encoding-check
        avg: -0.5%      min: -0.8%      max: -0.1%
serde-serde_derive-check
        avg: -0.4%      min: -0.7%      max: -0.0%
style-servo-check
        avg: -0.4%      min: -0.7%      max: -0.0%
tokio-webpush-simple-check
        avg: -0.5%      min: -0.7%      max: -0.2%
inflate-check
        avg: -0.2%      min: -0.7%      max: -0.0%
await-call-tree-check
        avg: -0.6%      min: -0.7%      max: -0.4%
issue-46449-check
        avg: -0.5%      min: -0.7%      max: -0.4%
wf-projection-stress-65510-che...
        avg: -0.2%      min: -0.6%      max: 0.0%
unicode_normalization-check
        avg: -0.2%      min: -0.6%      max: -0.0%
helloworld-check
        avg: -0.3%      min: -0.5%      max: -0.1%
ctfe-stress-4-check
        avg: -0.2%?     min: -0.5%?     max: 0.2%?
unify-linearly-check
        avg: -0.3%      min: -0.4%      max: -0.2%
deeply-nested-check
        avg: -0.3%      min: -0.4%      max: -0.2%
deep-vector-check
        avg: -0.1%      min: -0.3%      max: -0.0%
token-stream-stress-check
        avg: -0.1%      min: -0.1%      max: -0.0%

The biggest improvements are on "clean incremental" runs, followed by "patched incremental".

src/libserialize/leb128.rs

bors · 2020-02-11T10:40:04Z

☀️ Try build successful - checks-azure
Build commit: d902ca046d0a8cc72dd69a16627fa5da540030f1 (d902ca046d0a8cc72dd69a16627fa5da540030f1)

rust-timer · 2020-02-11T10:40:07Z

Queued d902ca046d0a8cc72dd69a16627fa5da540030f1 with parent dc4242d, future comparison URL.

michaelwoerister · 2020-02-11T12:35:17Z

That's interesting. I remember that switching the code from loop to for sped up the code considerably a couple of years ago. My theory now is that that past speedup came from duplicating the machine code for each integer type, allowing the branch predictor to do a better job, and that that speedup was so big that it was faster even though the for loop introduced more overhead.

Anyway, I'm happy to get any kind of improvement here. And it's even more safe than before 🎉

(In case someone is interested in the past of this implementation: https://github.com/michaelwoerister/encoding-bench contains a number of different versions that I tried out. It's rather messy as it's essentially a private repo but an interesting aspect is the test data files that are generated from actual rustc invocations)

rust-timer · 2020-02-11T21:05:34Z

Finished benchmarking try commit d902ca046d0a8cc72dd69a16627fa5da540030f1, comparison URL.

michaelwoerister · 2020-02-12T08:48:23Z

@bors r+

Thanks, @nnethercote!

bors · 2020-02-12T08:48:24Z

📌 Commit ad7802f has been approved by michaelwoerister

src/libserialize/leb128.rs

nnethercote · 2020-02-12T10:30:58Z

@bors r- until I have tried out @ranma42's suggestion.

ranma42 · 2020-02-12T10:44:33Z

I was just finding it strange that the most significant bit was cleared out (_ & 0x7f) just before it was being set (_ | 0x80).
I do not think it should make any difference in the timing (or even the generated code, as I believe LLVM will optimize it out).
If this is an performance-sensitive part of the compiler, I will try to have a deeper look :)

src/libserialize/leb128.rs

eddyb · 2020-02-12T12:20:34Z

@nnethercote If you're bored, I wonder how this implementation compares to the pre-#59820 one in libproc_macro (which I implemented from scratch in safe code).

It definitely feels like your new version here is close to mine, but without checking I can't tell which one LLVM will prefer (or if they compile all the same).

EDIT: also, has anyone considered using SIMD here, like @BurntSushi and others have employed for handling UTF-8/regexes etc.? I'm asking because UTF-8 is like a more complex LEB128.

bjorn3 · 2020-02-12T12:37:25Z

UTF-8 validation handles a lot of codepoints every call, while these read and write methods only handle a single LEB128 int per call, so SIMD is likely not useful.

eddyb · 2020-02-12T14:04:18Z

while these read and write methods only handle a single LEB128 int per call

May not be relevant, but the serialized data is basically a sequence of LEB128s (perhaps intermixed with strings), they just semantically represent more hierarchical values than an UTF-8 stream.

ranma42 · 2020-02-12T14:58:47Z

If you are willing to do processor-specific tuning, PDEP/PEXT (available on modern x86 processors) might be better suited than generic SIMD for this task.

gereeter · 2020-02-12T17:31:14Z

also, has anyone considered using SIMD here

See also Masked VByte [arXiv].

nnethercote · 2020-02-13T01:14:42Z

@nnethercote If you're bored, I wonder how this implementation compares to the pre-#59820 one in libproc_macro (which I implemented from scratch in safe code).

I tried the read and write implementations from libproc_macro individually, they both were slower than the code in this PR.

nnethercote · 2020-02-13T01:15:17Z

also, has anyone considered using SIMD here

See also Masked VByte [arXiv].

Thanks for the link, I will take a look... but not in this PR :)

nnethercote · 2020-02-13T01:15:27Z

@bors r=michaelwoerister

bors · 2020-02-13T01:15:29Z

📌 Commit ad7802f has been approved by michaelwoerister

nnethercote · 2020-02-13T01:47:02Z

BTW, in case anyone is curious, here's how I approached this bug. From profiling with Callgrind I saw that clap-rs-Check-CleanIncr was the benchmark+run+build combination most affected by LEB128 encoding. Its text output has entries like this:

265,344,872 ( 2.97%)  /home/njn/moz/rust0/src/libserialize/leb128.rs:rustc::ty::query::on_disk_cache::__ty_decoder_impl::<impl serialize::serialize::Decoder for rustc::ty::query::on_disk_cache::CacheDecoder>::read_usize
236,097,015 ( 2.64%)  /home/njn/moz/rust0/src/libserialize/leb128.rs:<rustc::ty::query::on_disk_cache::CacheEncoder<E> as serialize::serialize::Encoder>::emit_u32
213,551,888 ( 2.39%)  /home/njn/moz/rust0/src/libserialize/leb128.rs:rustc::ty::codec::encode_with_shorthand
165,042,682 ( 1.85%)  /home/njn/moz/rust0/src/libserialize/leb128.rs:<rustc_target::abi::VariantIdx as serialize::serialize::Decodable>::decode
 40,540,500 ( 0.45%)  /home/njn/moz/rust0/src/libserialize/leb128.rs:<u32 as serialize::serialize::Encodable>::encode
 24,026,292 ( 0.27%)  /home/njn/moz/rust0/src/libserialize/leb128.rs:serialize::serialize::Encoder::emit_seq
 20,160,540 ( 0.23%)  /home/njn/moz/rust0/src/libserialize/leb128.rs:<rustc::dep_graph::serialized::SerializedDepNodeIndex as serialize::serialize::Decodable>::decode
  9,661,323 ( 0.11%)  /home/njn/moz/rust0/src/libserialize/leb128.rs:serialize::serialize::Decoder::read_tuple
  4,898,927 ( 0.05%)  /home/njn/moz/rust0/src/libserialize/leb128.rs:<rustc::ty::query::on_disk_cache::CacheEncoder<E> as serialize::serialize::Encoder>::emit_usize
  3,384,018 ( 0.04%)  /home/njn/moz/rust0/src/libserialize/leb128.rs:<rustc_metadata::rmeta::encoder::EncodeContext as serialize::serialize::Encoder>::emit_u32
  2,296,440 ( 0.03%)  /home/njn/moz/rust0/src/libserialize/leb128.rs:<rustc::ty::UniverseIndex as serialize::serialize::Decodable>::decode

These are instruction counts, and the percentages sum to about 11%. Lots of different functions are involved because the LEB128 functions are inlined, but the file is leb128.rs in all of them, so I could tell where the relevant code lives. And the annotated code in that file looks like this:

          .           macro_rules! impl_write_unsigned_leb128 {
          .               ($fn_name:ident, $int_ty:ident) => {
          .                   #[inline]
          .                   pub fn $fn_name(out: &mut Vec<u8>, mut value: $int_ty) {
          .                       for _ in 0..leb128_size!($int_ty) {
143,877,210 ( 1.61%)                  let mut byte = (value & 0x7F) as u8;
 48,003,612 ( 0.54%)                  value >>= 7;
239,884,434 ( 2.69%)                  if value != 0 {
 47,959,070 ( 0.54%)                      byte |= 0x80;
          .                           }
          .
          .                           write_to_vec(out, byte);
          .
 47,959,070 ( 0.54%)                  if value == 0 {
          .                               break;
          .                           }
          .                       }
          .                   }
          .               };
          .           }
          .
          .           impl_write_unsigned_leb128!(write_u16_leb128, u16);
-- line 50 ----------------------------------------
-- line 57 ----------------------------------------
          .               ($fn_name:ident, $int_ty:ident) => {
          .                   #[inline]
          .                   pub fn $fn_name(slice: &[u8]) -> ($int_ty, usize) {
          .                       let mut result: $int_ty = 0;
          .                       let mut shift = 0;
          .                       let mut position = 0;
          .
          .                       for _ in 0..leb128_size!($int_ty) {
 59,507,824 ( 0.67%)                  let byte = unsafe { *slice.get_unchecked(position) };
          .                           position += 1;
204,126,888 ( 2.29%)                  result |= ((byte & 0x7F) as $int_ty) << shift;
119,023,350 ( 1.33%)                  if (byte & 0x80) == 0 {
          .                               break;
          .                           }
          .                           shift += 7;
          .                       }
          .
          .                       // Do a single bounds check at the end instead of for every byte.
 67,805,748 ( 0.76%)              assert!(position <= slice.len());
          .
          .                       (result, position)
          .                   }
          .               };
          .           }

Those percentages also add up to about 11%. Plus I poked around a bit at call sites and found this in a different file (libserialize/opaque.rs):

         .           macro_rules! read_uleb128 {
          .               ($dec:expr, $fun:ident) => {{
100,680,777 ( 1.13%)          let (value, bytes_read) = leb128::$fun(&$dec.data[$dec.position..]);
 67,858,196 ( 0.76%)          $dec.position += bytes_read;
 43,378,625 ( 0.49%)          Ok(value)
          .               }};
          .           }

which is another 2.38%. So it was clear that LEB128 reading/writing was hot.

I then tried gradually improving the code. I ended up measuring 18 different changes to the code. 10 of them were improvements (which I kept), and 8 were regressions (which I discarded). The following table shows the notes I took. The descriptions of the changes are a bit cryptic, but the basic technique should be clear.

IMPROVEMENTS
            clap-rs-Check-CleanIncr
feb10/Leb0  8,992M        $RUSTC0
feb10/Leb1  8,927M/99.3%  First attempt
feb11/Leb4  8,996M        $RUSTC0 but with bounds checking
feb11/Leb5  8,983M        `loop` for reading
feb11/Leb6  8,928M/99.3%  `loop` for writing, `write_to_vec` removed
feb11/Leb8  8,829M/98.1%  avoid mask on final byte in read loop
feb11/Leb9  8,529M/94.8%  in write loop, avoid a condition
feb11/Leb10 8,488M/94.4%  in write loop, mask/shift on final byte
feb13/Leb13 8,488M/94.4%  in write loop, push `(value | 0x80) as u8`
feb13/Leb15 8,488M/94.4%  in read loop, do `as` before `&`
feb13/Leb18 8,492M/94.4%  Landed (not sure about the extra 4M, oh well)

REGRESSIONS
feb11/Leb2  8,927M/99.3%  add slice0, slice1, slice2 vars
feb11/Leb3  9,127M        move the slow loop into a separate no-inline function
feb11/Leb7  8,930M        `< 128` in read loop
feb11/Leb11 8,492M        use `byte < 0x80` in read loop
feb12/Leb12 8,721M        unsafe pushing in write
feb13/Leb14 8,494M/94.4%  in write loop, push `(value as u8) | 0x80`
feb13/Leb16 8,831M        eddyb's write loop
feb13/Leb17 8,578M        eddyb's read loop

Every iteration took about 6.5 minutes to recompile, and about 2 minutes to measure with Cachegrind. I interleaved these steps with other work, so in practice each iteration took anywhere from 10-30 minutes, depending on context-switching delays.

The measurements in the notes are close to those from the CI run, which indicate the following for clap-rs-Check-CleanIncr:

instructions: -5.3%
cycles: -4.4%
wall-time: -3.9%

Instruction counts are almost deterministic and highly reliable. Cycle counts are more variable but still reasonable. Wall-time is highly variable and barely trustworthy. But they're all pointing in the same direction, which is encouraging.

Looking at the instruction counts, we saw that LEB128 operations were about 11-13% of instructions originally, and instruction counts went down by about 5%, which suggests that the LEB128 operations are a bit less than twice as fast as they were. Pretty good.

@michaelwoerister

…r=michaelwoerister Micro-optimize the heck out of LEB128 reading and writing. This commit makes the following writing improvements: - Removes the unnecessary `write_to_vec` function. - Reduces the number of conditions per loop from 2 to 1. - Avoids a mask and a shift on the final byte. And the following reading improvements: - Removes an unnecessary type annotation. - Fixes a dangerous unchecked slice access. Imagine a slice `[0x80]` -- the current code will read past the end of the slice some number of bytes. The bounds check at the end will subsequently trigger, unless something bad (like a crash) happens first. The cost of doing bounds check in the loop body is negligible. - Avoids a mask on the final byte. And the following improvements for both reading and writing: - Changes `for` to `loop` for the loops, avoiding an unnecessary condition on each iteration. This also removes the need for `leb128_size`. All of these changes give significant perf wins, up to 5%. r? @michaelwoerister

@ghost

Rollup of 9 pull requests Successful merges: - #67642 (Relax bounds on HashMap/HashSet) - #68848 (Hasten macro parsing) - #69008 (Properly use parent generics for opaque types) - #69048 (Suggestion when encountering assoc types from hrtb) - #69049 (Optimize image sizes) - #69050 (Micro-optimize the heck out of LEB128 reading and writing.) - #69068 (Make the SGX arg cleanup implementation a NOP) - #69082 (When expecting `BoxFuture` and using `async {}`, suggest `Box::pin`) - #69104 (bootstrap: Configure cmake when building sanitizer runtimes) Failed merges: r? @ghost

bors · 2020-02-13T08:26:32Z

☔ The latest upstream changes (presumably #69118) made this pull request unmergeable. Please resolve the merge conflicts.

Veedrac · 2020-02-13T17:32:51Z

In response to earlier comments, PEXT can be used to encode with something like (untested)

fn leb128enc(value: u32) -> [u8; 8] {
    let hi = 0x8080_8080_8080_8080;
    let split = unsafe { _pdep_u64(value as u64, !hi) };
    let tags = ((!0 >> (split | 1).leading_zeros()) & hi;
    return (split | tags).to_le_bytes();
}

You can do a similar thing with PDEP for decoding. Encoding larger integers is probably just best off using a branch to handle full chunks of 56 bits (with let tags = hi) before finishing with the above.

nnethercote · 2020-02-13T21:11:16Z

@fitzgen tried using PEXT a while back in a different project. For the common case (small integers that fit in 1 byte) it was a slight slowdown:
https://twitter.com/fitzgen/status/1138784734417432576

fitzgen · 2020-02-13T21:17:53Z

Also, on intel chips, pext is implemented in hardware and super fast (one or two cycles iirc), but on amd it is implemented in microcode and is muuuuuch slower (150-300 cycles). Would have to be careful with it.

fitzgen · 2020-02-13T21:20:30Z

https://twitter.com/trav_downs/status/1225995720458735617

Veedrac · 2020-02-13T21:47:39Z

@nnethercote The thing I would worry about with PEXT is the copy; if you do that byte-at-a-time (or with memcpy) you probably eat a lot of the earnings. The key for a fast variable-length copy is to always add the maximum size and then bump the pointer by the length instead (or truncate the vector, in the Rust case). Being able to avoid the >10% mispredict rate probably pays for the few extra instructions in the common cases, but you need to specifically design for that.

PR rust-lang#69050 changed LEB128 reading and writing. After it landed I did some double-checking and found that the writing changes were universally a speed-up, but the reading changes were not. I'm not exactly sure why, perhaps there was a quirk of inlining in the particular revision I was originally working from. This commit reverts some of the reading changes, while still avoiding `unsafe` code. I have checked it on multiple revisions and the speed-ups seem to be robust.

@michaelwoerister

Tweak LEB128 reading some more. PR #69050 changed LEB128 reading and writing. After it landed I did some double-checking and found that the writing changes were universally a speed-up, but the reading changes were not. I'm not exactly sure why, perhaps there was a quirk of inlining in the particular revision I was originally working from. This commit reverts some of the reading changes, while still avoiding `unsafe` code. I have checked it on multiple revisions and the speed-ups seem to be robust. r? @michaelwoerister

nnethercote · 2022-01-06T02:13:52Z

#92604 is a successor to this PR, for those who like LEB128 micro-optimizations.

As it turns out, the Rust compiler uses variable length LEB128 encoded integers internally. It so happens that they spent a fair amount of effort micro-optimizing the decoding functionality [0] [1], as it's in the hot path. With this change we replace our decoding routines with these optimized ones. To make that happen more easily (and to gain some base line speed up), also remove the "shift" return from the respective methods. As a result of these changes, we see a respective speed up: Before: test util::tests::bench_u64_leb128_reading ... bench: 128 ns/iter (+/- 10) After: test util::tests::bench_u64_leb128_reading ... bench: 103 ns/iter (+/- 5) Gsym decoding, which uses these routines, improved as follows: main/symbolize_gsym_multi_no_setup time: [146.26 µs 146.69 µs 147.18 µs] change: [−7.2075% −5.7106% −4.4870%] (p = 0.00 < 0.02) Performance has improved. [0] rust-lang/rust#69050 [1] rust-lang/rust#69157 Signed-off-by: Daniel Müller <deso@posteo.net>

As it turns out, the Rust compiler uses variable length LEB128 encoded integers internally. It so happens that they spent a fair amount of effort micro-optimizing the decoding functionality [0] [1], as it's in the hot path. With this change we replace our decoding routines with these optimized ones. To make that happen more easily (and to gain some base line speed up), also remove the "shift" return from the respective methods. As a result of these changes, we see a respectable speed up: Before: test util::tests::bench_u64_leb128_reading ... bench: 128 ns/iter (+/- 10) After: test util::tests::bench_u64_leb128_reading ... bench: 103 ns/iter (+/- 5) Gsym decoding, which uses these routines, improved as follows: main/symbolize_gsym_multi_no_setup time: [146.26 µs 146.69 µs 147.18 µs] change: [−7.2075% −5.7106% −4.4870%] (p = 0.00 < 0.02) Performance has improved. [0] rust-lang/rust#69050 [1] rust-lang/rust#69157 Signed-off-by: Daniel Müller <deso@posteo.net>

rust-highfive assigned michaelwoerister Feb 11, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 11, 2020

Marwes reviewed Feb 11, 2020

View reviewed changes

src/libserialize/leb128.rs Show resolved Hide resolved

tesuji reviewed Feb 11, 2020

View reviewed changes

src/libserialize/leb128.rs Show resolved Hide resolved

src/libserialize/leb128.rs Show resolved Hide resolved

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 12, 2020

ranma42 reviewed Feb 12, 2020

View reviewed changes

src/libserialize/leb128.rs Show resolved Hide resolved

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Feb 12, 2020

eddyb reviewed Feb 12, 2020

View reviewed changes

src/libserialize/leb128.rs Show resolved Hide resolved

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 13, 2020

Dylan-DPC-zz mentioned this pull request Feb 13, 2020

Rollup of 9 pull requests #69118

Merged

bors merged commit ad7802f into rust-lang:master Feb 13, 2020

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Feb 13, 2020

nnethercote deleted the micro-optimize-leb128 branch February 13, 2020 08:29

nnethercote mentioned this pull request Feb 14, 2020

Tweak LEB128 reading some more. #69157

Closed

folkertdev mentioned this pull request Dec 7, 2022

Wasm interp speedup roc-lang/roc#4707

Closed

d-e-s-o mentioned this pull request Jun 5, 2024

Optimize LEB128 data reading libbpf/blazesym#719

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Micro-optimize the heck out of LEB128 reading and writing. #69050

Micro-optimize the heck out of LEB128 reading and writing. #69050

nnethercote commented Feb 11, 2020

nnethercote commented Feb 11, 2020

rust-timer commented Feb 11, 2020

bors commented Feb 11, 2020

nnethercote commented Feb 11, 2020

bors commented Feb 11, 2020

rust-timer commented Feb 11, 2020

michaelwoerister commented Feb 11, 2020

rust-timer commented Feb 11, 2020

michaelwoerister commented Feb 12, 2020

bors commented Feb 12, 2020

nnethercote commented Feb 12, 2020

ranma42 commented Feb 12, 2020

eddyb commented Feb 12, 2020 •

edited

Loading

bjorn3 commented Feb 12, 2020

eddyb commented Feb 12, 2020

ranma42 commented Feb 12, 2020

gereeter commented Feb 12, 2020

nnethercote commented Feb 13, 2020

nnethercote commented Feb 13, 2020

nnethercote commented Feb 13, 2020

bors commented Feb 13, 2020

nnethercote commented Feb 13, 2020

bors commented Feb 13, 2020

Veedrac commented Feb 13, 2020 •

edited

Loading

nnethercote commented Feb 13, 2020

fitzgen commented Feb 13, 2020

fitzgen commented Feb 13, 2020

Veedrac commented Feb 13, 2020 •

edited

Loading

nnethercote commented Jan 6, 2022

Micro-optimize the heck out of LEB128 reading and writing. #69050

Micro-optimize the heck out of LEB128 reading and writing. #69050

Conversation

nnethercote commented Feb 11, 2020

nnethercote commented Feb 11, 2020

rust-timer commented Feb 11, 2020

bors commented Feb 11, 2020

nnethercote commented Feb 11, 2020

bors commented Feb 11, 2020

rust-timer commented Feb 11, 2020

michaelwoerister commented Feb 11, 2020

rust-timer commented Feb 11, 2020

michaelwoerister commented Feb 12, 2020

bors commented Feb 12, 2020

nnethercote commented Feb 12, 2020

ranma42 commented Feb 12, 2020

eddyb commented Feb 12, 2020 • edited Loading

bjorn3 commented Feb 12, 2020

eddyb commented Feb 12, 2020

ranma42 commented Feb 12, 2020

gereeter commented Feb 12, 2020

nnethercote commented Feb 13, 2020

nnethercote commented Feb 13, 2020

nnethercote commented Feb 13, 2020

bors commented Feb 13, 2020

nnethercote commented Feb 13, 2020

bors commented Feb 13, 2020

Veedrac commented Feb 13, 2020 • edited Loading

nnethercote commented Feb 13, 2020

fitzgen commented Feb 13, 2020

fitzgen commented Feb 13, 2020

Veedrac commented Feb 13, 2020 • edited Loading

nnethercote commented Jan 6, 2022

eddyb commented Feb 12, 2020 •

edited

Loading

Veedrac commented Feb 13, 2020 •

edited

Loading

Veedrac commented Feb 13, 2020 •

edited

Loading