Significantly slower --emit=metadata #47267

leonardo-m · 2018-01-08T10:46:10Z

I have found a significant time increase when I use "rustc --emit=metadata". I have located the source of such difference between two Nightlies, that I've named A and B, the time is for my code:

A:
rustup default nightly-2017-12-14
rustc 1.24.0-nightly (f8af59d95 2017-12-13)
commit-hash: f8af59d95225d122e38bb5dceb1027844d4ce170
host: x86_64-pc-windows-gnu
--emit=metadata time: 4.35 seconds

B:
rustup default nightly-2017-12-15
rustc 1.24.0-nightly (0077d128d 2017-12-14)
commit-hash: 0077d128d33543290140763ce7d84d05eb57c40a
host: x86_64-pc-windows-gnu
--emit=metadata time: 5.22 seconds

If you perform a complete compilation instead of just --emit=metadata the timing difference between the A and B compiler is of course much smaller. But --emit=metadata time is important to verify the code.

To show an example I've done:

cargo check --all

On rust-ndarray version 0.11.0, with both A and B. Then I've cleaned the directory:

rust-ndarray-master\target\debug\deps

And I've done "cargo check --all" again for both both A and B. This is not the best test for this problem, but it shows a difference (25.76 seconds for A, and 26.75 seconds for B), the difference percentage is small because it's not equivalent to just a "--emit=metadata".

The text was updated successfully, but these errors were encountered:

eddyb · 2018-01-08T10:58:36Z

cc @michaelwoerister

michaelwoerister · 2018-01-08T11:19:49Z

IIRC, metadata generation is the first point where we compute the MIR of most functions, so the slowdown does not have to be specific to metadata encoding.

alexcrichton · 2018-01-08T22:39:06Z

According to the list of changes the candidates probably are:

a number of NLL changes
addition of miri
faster span hashing (a bit of a stretch)

@leonardo-m would you be able to share detailed instructions of how to reproduce? Testing the two nightlies in question on rust-ndarray/ndarray@bda4c0d doesn't show much differnce to me for cargo check --all and incremental changes

leonardo-m · 2018-01-09T08:53:43Z

So far I haven't found a better way to reproduce the problem, my code is not public. But if no one else is having this problem, then it's just a problem of my code, and I'll close down this issue.

A simple thing that I didn't think about using is "-Ztime-passes --emit=metadata":

A:
  time: 0.058; rss: 32MB	parsing
  time: 0.000; rss: 32MB	recursion limit
  time: 0.000; rss: 32MB	crate injection
  time: 0.000; rss: 32MB	plugin loading
  time: 0.000; rss: 32MB	plugin registration
  time: 0.152; rss: 59MB	expansion
  time: 0.000; rss: 59MB	maybe building test harness
  time: 0.003; rss: 59MB	maybe creating a macro crate
  time: 0.008; rss: 59MB	creating allocators
  time: 0.000; rss: 59MB	checking for inline asm in case the target doesn't support it
  time: 0.003; rss: 59MB	AST validation
  time: 0.056; rss: 66MB	name resolution
  time: 0.008; rss: 67MB	complete gated feature checking
  time: 0.042; rss: 81MB	lowering ast -> hir
  time: 0.015; rss: 81MB	early lint checks
  time: 0.091; rss: 86MB	indexing hir
  time: 0.003; rss: 86MB	attribute checking
  time: 0.000; rss: 78MB	load query result cache
  time: 0.000; rss: 78MB	looking for entry point
  time: 0.000; rss: 78MB	looking for plugin registrar
  time: 0.003; rss: 78MB	loop checking
  time: 0.000; rss: 78MB	static item recursion checking
  time: 0.009; rss: 79MB	stability checking
  time: 0.052; rss: 99MB	type collecting
  time: 0.000; rss: 99MB	outlives testing
  time: 0.000; rss: 99MB	impl wf inference
  time: 0.008; rss: 101MB	coherence checking
  time: 0.000; rss: 101MB	variance testing
  time: 0.058; rss: 108MB	wf checking
  time: 0.017; rss: 108MB	item-types checking
  time: 2.134; rss: 134MB	item-bodies checking
  time: 0.124; rss: 139MB	const checking
  time: 0.038; rss: 139MB	privacy checking
  time: 0.005; rss: 139MB	intrinsic checking
  time: 0.020; rss: 139MB	match checking
  time: 0.062; rss: 140MB	liveness checking
  time: 1.079; rss: 186MB	borrow checking
  time: 0.001; rss: 186MB	MIR borrow checking
  time: 0.000; rss: 186MB	MIR effect checking
  time: 0.014; rss: 186MB	death checking
  time: 0.000; rss: 186MB	unused lib feature checking
  time: 0.075; rss: 188MB	lint checking
  time: 0.000; rss: 188MB	resolving dependency formats
    time: 0.000; rss: 189MB	write metadata
    time: 0.000; rss: 197MB	assert dep graph
    time: 0.000; rss: 197MB	serialize dep graph
  time: 0.053; rss: 197MB	translation
  time: 0.000; rss: 95MB	serialize work products
  time: 0.000; rss: 95MB	linking


B:
  time: 0.058; rss: 32MB	parsing
  time: 0.000; rss: 32MB	recursion limit
  time: 0.000; rss: 32MB	crate injection
  time: 0.000; rss: 32MB	plugin loading
  time: 0.000; rss: 32MB	plugin registration
  time: 0.118; rss: 59MB	expansion
  time: 0.000; rss: 59MB	maybe building test harness
  time: 0.003; rss: 59MB	maybe creating a macro crate
  time: 0.008; rss: 60MB	creating allocators
  time: 0.000; rss: 60MB	checking for inline asm in case the target doesn't support it
  time: 0.003; rss: 60MB	AST validation
  time: 0.056; rss: 67MB	name resolution
  time: 0.008; rss: 67MB	complete gated feature checking
  time: 0.042; rss: 82MB	lowering ast -> hir
  time: 0.015; rss: 82MB	early lint checks
  time: 0.050; rss: 85MB	indexing hir
  time: 0.003; rss: 85MB	attribute checking
  time: 0.000; rss: 79MB	load query result cache
  time: 0.000; rss: 79MB	looking for entry point
  time: 0.000; rss: 79MB	looking for plugin registrar
  time: 0.003; rss: 80MB	loop checking
  time: 0.000; rss: 80MB	static item recursion checking
  time: 0.009; rss: 81MB	stability checking
  time: 0.052; rss: 100MB	type collecting
  time: 0.000; rss: 100MB	outlives testing
  time: 0.000; rss: 100MB	impl wf inference
  time: 0.008; rss: 101MB	coherence checking
  time: 0.000; rss: 101MB	variance testing
  time: 0.058; rss: 108MB	wf checking
  time: 0.018; rss: 109MB	item-types checking
  time: 2.118; rss: 134MB	item-bodies checking
  time: 0.126; rss: 139MB	const checking
  time: 0.038; rss: 139MB	privacy checking
  time: 0.005; rss: 139MB	intrinsic checking
  time: 0.020; rss: 139MB	match checking
  time: 0.061; rss: 140MB	liveness checking
  time: 1.955; rss: 188MB	borrow checking
  time: 0.001; rss: 188MB	MIR borrow checking
  time: 0.000; rss: 188MB	MIR effect checking
  time: 0.014; rss: 188MB	death checking
  time: 0.000; rss: 188MB	unused lib feature checking
  time: 0.077; rss: 190MB	lint checking
  time: 0.000; rss: 190MB	resolving dependency formats
    time: 0.000; rss: 190MB	write metadata
    time: 0.000; rss: 198MB	assert dep graph
    time: 0.000; rss: 198MB	serialize dep graph
  time: 0.052; rss: 198MB	translation
  time: 0.000; rss: 95MB	serialize work products
  time: 0.000; rss: 95MB	linking

The only significant difference I am seeing in those two listings is for the borrow checking:

A: time: 1.079; rss: 186MB borrow checking

B: time: 1.955; rss: 188MB borrow checking

So the cause could be the NLL changes. I have an open issue about NLL ( #47022 ), so I can't compile my code with #![feature(nll)]. But I've seen that nll multiplies by 5.5X my "--emit=metadata" compilation time.

eddyb · 2018-01-09T08:56:10Z

cc @nikomatsakis

nikomatsakis · 2018-01-11T22:38:29Z

Huh, that's interesting. I don't have much of an idea why that would be. It'd be really nice if we could try to reproduce this with some standalone file.

nikomatsakis · 2018-01-11T22:39:11Z

I'm tagging it as WG-compiler-nll for now.

leonardo-m · 2018-01-15T18:02:49Z

This is a problem related but different from the main one discussed above. The last Nigtly (1.25.0 3f92e8d 2018-01-14) is finally able to compile my code if I add the nll feature. But compiling it with "--emit=metadata" goes from 5.30 seconds to 31.84 seconds (about 6X slower). If I also use "-Ztime-passes" I see the differences:

Without NLL:
  time: 2.001; rss: 187MB	borrow checking
  time: 0.001; rss: 187MB	MIR borrow checking

With NLL:
  time: 2.184; rss: 193MB	borrow checking
  time: 26.063; rss: 199MB	MIR borrow checking

michaelwoerister · 2018-01-16T10:38:35Z

Ouch :)

nikomatsakis · 2018-01-16T15:39:52Z

@leonardo-m perhaps you can run with perf and give some kind of profiling? It's kind of hard to diagnose what the problem is without some better way to reproduce.

(How private is the code? If you are willing to send via e-mail, I'd of course promise not to redistribute.)

leonardo-m · 2018-01-18T12:07:02Z

Yes, I could send you all the code. But first let's try with an example I've found. This small program shows some timings difference compiling with "--emit=metadata" (0.47 seconds with nll, and 0.27 seconds without):

#![feature(nll)]

use std::mem::size_of;

macro_rules! isqrt { ($e:expr, $t:ident) => ((($e) as f64).sqrt() as $t)  }

struct SieveBits { bits: Vec<usize> }

impl SieveBits {
    const BPC: usize = size_of::<usize>() * 8;

    fn is_bit_set(bits: &[usize], i: usize) -> bool {
        debug_assert!(i < bits.len() * Self::BPC);
        let offset = i / Self::BPC;
        let mask = 1 << (i % Self::BPC);
        unsafe { (*bits.get_unchecked(offset) & mask) != 0 }
    }

    fn reset_bit(bits: &mut [usize], i: usize) {
        debug_assert!(i < bits.len() * Self::BPC);
        let offset = i / Self::BPC;
        let mask = 1 << (i % Self::BPC);
        unsafe { *bits.get_unchecked_mut(offset) &= !mask; }
    }

    fn new(m: usize) -> Self {
        if m < 2 {
            return Self { bits: vec![] };
        }
        let mut primes = vec![std::usize::MAX; (m / 3) / Self::BPC + 1];
        Self::reset_bit(&mut primes, 0);

        let lim = isqrt!(m, usize) + 1;
        let mut i = 5;
        let mut step = 2;
        while i < lim {
            if Self::is_bit_set(&primes, i / 3) {
                let mut j = i * i;
                let mut step2 = step;
                while j < m {
                    Self::reset_bit(&mut primes, j / 3);
                    j += step2 * i;
                    step2 = 6 - step2;
                }
            }
            i += step;
            step = 6 - step;
        }

        Self { bits: primes }
    }

    fn is_prime(&self, i: usize) -> bool {
        if i == 2 || i == 3 { true }
        else if i % 2 == 0 || i % 3 == 0 { false }
        else { Self::is_bit_set(&self.bits, i / 3) }
    }
}

fn eratosthenes_sieve_u32(limit: usize) -> Vec<u32> {
    assert!(limit <= std::u32::MAX as usize);

    #[inline(always)]
    fn fill_u8(data: &mut [u8], value: u8) {
        unsafe {
            std::ptr::write_bytes(data.as_mut_ptr(), value, data.len());
        }
    }

    const L1D_CACHE_SIZE:u32 = 32_768;

    let mut result = vec![];
    if limit < 2 {
        return result;
    } else {
        result.push(2);
    }

    let lsqrt = isqrt!(limit, u32);

    let mut is_prime = vec![1u8; lsqrt as usize + 1];
    let mut i = 2;
    while i * i <= lsqrt {
        unsafe {
            if *is_prime.get_unchecked(i as usize) != 0 {
                let mut j = i * i;
                while j <= lsqrt {
                    *is_prime.get_unchecked_mut(j as usize) = 0;
                    j += i;
                }
            }
        }
        i += 1;
    }

    let segment_size = lsqrt.max(L1D_CACHE_SIZE);
    let mut s: usize = 3;
    let mut n: usize = 3;

    let mut sieve = vec![1u8; segment_size as usize];
    let mut primes: Vec<u32> = vec![];
    let mut next: Vec<u32> = vec![];
    let mut low: usize = 0;

    while low <= limit {
        fill_u8(&mut sieve, 1);

        let high = limit.min(low + segment_size as usize - 1);

        unsafe {
            while s * s <= high {
                if *is_prime.get_unchecked(s) != 0 {
                    primes.push(s as u32);
                    next.push((s * s - low) as u32);
                }
                s += 2;
            }

            for (i, &p) in primes.iter().enumerate() {
                let k = p * 2;
                let mut j = *next.get_unchecked(i);
                while j < segment_size {
                    *sieve.get_unchecked_mut(j as usize) = 0;
                    j += k;
                }
                *next.get_unchecked_mut(i) = j - segment_size;
            }

            while n <= high {
                if *sieve.get_unchecked(n - low) != 0 {
                    result.push(n as u32);
                }
                n += 2;
            }
        }

        low += segment_size as usize;
    }

    result
}

fn main() {
    fn e60() -> u32 {
        struct DPrime { n: u32, fac: u32 }

        impl DPrime {
            fn new(p: u32) -> Self {
                fn next_pow10(n: u32) -> u32 {
                    let mut p = 10;
                    while n > p { p *= 10; }
                    p
                }

                Self { n: p, fac: next_pow10(p) }
            }
        }

        const LIM: u32 = 10_000;

        let mut primes = vec![DPrime::new(3)];
        primes.extend(eratosthenes_sieve_u32(LIM as usize).iter().skip(3).map(|&p| DPrime::new(p)));

        let sieve = SieveBits::new((LIM * LIM) as usize);

        let are_coprimes = |a: &DPrime, b: &DPrime|
            b.n > a.n &&
            sieve.is_prime((a.n + a.fac * b.n) as usize) &&
            sieve.is_prime((b.n + b.fac * a.n) as usize);

        let mut best_sum = std::u32::MAX;

        for p1 in &primes {
            if p1.n >= best_sum { break; }
            let p2v: Vec<_> = primes.iter().filter(|p2| are_coprimes(p1, p2)).collect();
            for p2 in &p2v {
                if p1.n + p2.n >= best_sum { break; }
                let p3v: Vec<_> = p2v.iter().filter(|p3| are_coprimes(p2, p3)).collect();
                for p3 in &p3v {
                    if p1.n + p2.n + p3.n >= best_sum { break; }
                    let p4v: Vec<_> = p3v.iter().filter(|p4| are_coprimes(p3, p4)).collect();
                    for p4 in &p4v {
                        if p1.n + p2.n + p3.n + p4.n >= best_sum { break; }
                        if let Some(p5) = p4v.iter().find(|p5| are_coprimes(p4, p5)) {
                            let sum = p1.n + p2.n + p3.n + p4.n + p5.n;
                            if sum < best_sum {
                                best_sum = sum;
                            }
                        }
                    }
                }
            }
        }
        best_sum
    }
    assert_eq!(e60(), 26_033);
}


Without nll feature:
time: 0.000; rss: 69MB    MIR borrow checking

With nll feature:
time: 0.203; rss: 70MB    MIR borrow checking

nikomatsakis · 2018-01-18T22:35:06Z

@leonardo-m thanks =)

nikomatsakis · 2018-01-24T14:50:29Z

Some notes on the runtime of this test that I've extracted from profiling:

First, 12% of compilation time (just a regular build) is spent in librustc_mir::borrow_check
Of that, virtually all of it is spent in NLL, and NLL divides its time roughly equally between:
- the MIR type check (~6%)
- the region inference (~6%)

nikomatsakis · 2018-01-24T14:55:11Z

Browsing the MIR for the main function is interesting (here is a gist), you can definitely see some of the density/sparsity of the region values.

spastorino · 2018-01-26T21:09:36Z

Some numbers of @leonardo-m example, with and without this PR #47766
I'm showing the average numbers after running time rustc several times.

Without PR

[santiago@archlinux rust2 (master)]$ time ./build/x86_64-unknown-linux-gnu/stage1/bin/rustc main.rs

real	0m2.822s
user	0m3.043s
sys	0m0.064s

With PR

[santiago@archlinux rust1 (inference-dirty-list)]$ time ./build/x86_64-unknown-linux-gnu/stage1/bin/rustc main.rs

real	0m1.125s
user	0m1.262s
sys	0m0.086s

With PR, NLL turned off

[santiago@archlinux rust1 (inference-dirty-list)]$ time ./build/x86_64-unknown-linux-gnu/stage1/bin/rustc main.rs

real	0m0.869s
user	0m1.012s
sys	0m0.081s

So, there's still a lot to improve but getting closer 😄

leonardo-m · 2018-02-05T09:42:45Z

The compile time with --emit=metadata and nll feature has improved a bit, but it's still about 5X slower. The compile time without the nll has improved in the last two nightlies and it's now almost OK.

leonardo-m · 2018-05-20T22:28:07Z

Since the last Nightly the compiler improvements have essentially put the compiler back to the original performance discussed here. The performance of the NLL feature is discussed elsewhere, so I close down this issue.

leonardo-m mentioned this issue Jan 8, 2018

Slower binary, slower compiler, bigger binary since some days #46897

Closed

nikomatsakis added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-nll I-compiletime Issue: Problems and improvements with respect to compile times. labels Jan 11, 2018

nikomatsakis modified the milestone: NLL run-pass without ICEs Jan 18, 2018

leonardo-m mentioned this issue Jan 19, 2018

NLL can be very slow #47597

Closed

nikomatsakis added this to the NLL: Performance is good milestone Jan 19, 2018

nikomatsakis added the A-NLL Area: Non Lexical Lifetimes (NLL) label Jan 19, 2018

This was referenced Jan 27, 2018

Make region inference use a dirty list #47766

Merged

NLL performance tracking issue #47879

Closed

nikomatsakis added the NLL-performant Working towards the "performance is good" goal label Mar 14, 2018

XAMPPRocky added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Apr 10, 2018

leonardo-m closed this as completed May 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significantly slower --emit=metadata #47267

Significantly slower --emit=metadata #47267

leonardo-m commented Jan 8, 2018 •

edited

Loading

eddyb commented Jan 8, 2018

michaelwoerister commented Jan 8, 2018

alexcrichton commented Jan 8, 2018

leonardo-m commented Jan 9, 2018 •

edited

Loading

eddyb commented Jan 9, 2018

nikomatsakis commented Jan 11, 2018 •

edited

Loading

nikomatsakis commented Jan 11, 2018

leonardo-m commented Jan 15, 2018

michaelwoerister commented Jan 16, 2018

nikomatsakis commented Jan 16, 2018

leonardo-m commented Jan 18, 2018 •

edited

Loading

nikomatsakis commented Jan 18, 2018

nikomatsakis commented Jan 24, 2018

nikomatsakis commented Jan 24, 2018

spastorino commented Jan 26, 2018

leonardo-m commented Feb 5, 2018

leonardo-m commented May 20, 2018

Significantly slower --emit=metadata #47267

Significantly slower --emit=metadata #47267

Comments

leonardo-m commented Jan 8, 2018 • edited Loading

eddyb commented Jan 8, 2018

michaelwoerister commented Jan 8, 2018

alexcrichton commented Jan 8, 2018

leonardo-m commented Jan 9, 2018 • edited Loading

eddyb commented Jan 9, 2018

nikomatsakis commented Jan 11, 2018 • edited Loading

nikomatsakis commented Jan 11, 2018

leonardo-m commented Jan 15, 2018

michaelwoerister commented Jan 16, 2018

nikomatsakis commented Jan 16, 2018

leonardo-m commented Jan 18, 2018 • edited Loading

nikomatsakis commented Jan 18, 2018

nikomatsakis commented Jan 24, 2018

nikomatsakis commented Jan 24, 2018

spastorino commented Jan 26, 2018

leonardo-m commented Feb 5, 2018

leonardo-m commented May 20, 2018

leonardo-m commented Jan 8, 2018 •

edited

Loading

leonardo-m commented Jan 9, 2018 •

edited

Loading

nikomatsakis commented Jan 11, 2018 •

edited

Loading

leonardo-m commented Jan 18, 2018 •

edited

Loading