Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.; 10] generates worse code than [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.] #56333

Closed
jrmuizel opened this issue Nov 29, 2018 · 6 comments
Closed
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@jrmuizel
Copy link
Contributor

Here's an example

pub struct L {
    a: [f64; 10],
}

pub struct Allocation<'a, T: 'a> {
    f: &'a mut T,
}

impl<'a, T> Allocation<'a, T> {
    pub fn init(self, value: T) {
        *self.f = value;
    }
}

#[inline(never)]
pub fn foo(a: Allocation<L>) {
    a.init(L {
        a: [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
    });
}

#[inline(never)]
pub fn bar(a: Allocation<L>) {
    a.init(L { a: [1.; 10] });
}

gives

.LCPI0_0:
  .quad 4607182418800017408
  .quad 4607182418800017408
example::foo:
  movaps xmm0, xmmword ptr [rip + .LCPI0_0]
  movups xmmword ptr [rdi], xmm0
  movups xmmword ptr [rdi + 16], xmm0
  movups xmmword ptr [rdi + 32], xmm0
  movups xmmword ptr [rdi + 48], xmm0
  movups xmmword ptr [rdi + 64], xmm0
  ret

.LCPI1_0:
  .quad 4607182418800017408
  .quad 4607182418800017408
example::bar:
  sub rsp, 88
  movaps xmm0, xmmword ptr [rip + .LCPI1_0]
  movaps xmmword ptr [rsp], xmm0
  movaps xmmword ptr [rsp + 16], xmm0
  movaps xmmword ptr [rsp + 32], xmm0
  movaps xmmword ptr [rsp + 48], xmm0
  movaps xmmword ptr [rsp + 64], xmm0
  movaps xmm0, xmmword ptr [rsp + 64]
  movups xmmword ptr [rdi + 64], xmm0
  movaps xmm0, xmmword ptr [rsp + 48]
  movups xmmword ptr [rdi + 48], xmm0
  movaps xmm0, xmmword ptr [rsp + 32]
  movups xmmword ptr [rdi + 32], xmm0
  movaps xmm0, xmmword ptr [rsp + 16]
  movups xmmword ptr [rdi + 16], xmm0
  movaps xmm0, xmmword ptr [rsp]
  movups xmmword ptr [rdi], xmm0
  add rsp, 88
  ret

which has an additional copy of the array.

@oli-obk oli-obk added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-codegen Area: Code generation labels Nov 29, 2018
@nagisa
Copy link
Member

nagisa commented Dec 6, 2018

We generate an explicit loop for all in-line repeat initializers which is why the codegen is usually worse compared to plain literals.

@jrmuizel
Copy link
Contributor Author

I filed an llvm bug about this: https://bugs.llvm.org/show_bug.cgi?id=40011

@nikic nikic added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Dec 13, 2018
@ebkalderon
Copy link
Contributor

Looking at the comment responding to that bug, it seems that it might be an LLVM pass ordering issue. Just curious, but is this something that we can fix or work around on our own fork of LLVM? Or is this a more general issue that will need to be resolved upstream?

@nagisa
Copy link
Member

nagisa commented Oct 13, 2019

We prefer to have as few differences in our fork from upstream as possible, and what differences we have, must pull their weight to warrant backporting them every time we bump LLVM.

@steveklabnik
Copy link
Member

Triage: no change

@jrmuizel
Copy link
Contributor Author

This appears to be fixed by #81451

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

6 participants