Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider storing auto-generated error message strings in separate object file #37512

Open
michaelwoerister opened this issue Nov 1, 2016 · 4 comments
Labels
A-codegen Area: Code generation A-incr-comp Area: Incremental compilation C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compiletime Issue: Problems and improvements with respect to compile times. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@michaelwoerister
Copy link
Member

Some kinds of expression can generate a runtime panic with a compiler generated error message containing the source location of that expression. Examples are arithmetic expressions that can cause integer overflow or division by zero, and expressions that result in array bounds checks.

So far we are allocating string constants in the same codegen unit as the expression, which has two disadvantages:

  • Because we do not check whether there is already a constant for that source location, we will have copies of the same data for each monomorphized instance of a function. (Or does LLVM merge equal constants if there address is never taken?)

  • Since the source location is contained in machine code, the whole object file often has to be re-compiled during incremental compilation even if just formatting has changed or comments have been added.

This could be solved by interning all those constants into a separate object file with a (semi-)stable symbol name (e.g. symbol_name = function_symbol_name + index within function). That way, only the error-message-object-file has to be regenerated when nothing but formatting has changed.

@michaelwoerister michaelwoerister added I-slow Issue: Problems and improvements with respect to performance of generated code. C-enhancement Category: An issue proposing an enhancement or a PR with one. A-codegen Area: Code generation A-incr-comp Area: Incremental compilation labels Nov 1, 2016
@retep998
Copy link
Member

retep998 commented Nov 1, 2016

For -msvc, just make each constant a COMDAT and link's /OPT:REF,ICF will remove unused constants and deduplicate them as well.

@Mark-Simulacrum
Copy link
Member

This would impact compile times as well, I believe; so perhaps I-compiletime?

In fact, if I'm interpreting correctly, this means that all x[foo] and x + 10 expressions cause an increase both final executable/library size as well as more code for LLVM to chew on.

@michaelwoerister michaelwoerister added the I-compiletime Issue: Problems and improvements with respect to compile times. label Nov 1, 2016
@michaelwoerister
Copy link
Member Author

It seems that we are already checking whether there is an existing string constant with some given contents, so file names and error message text are not duplicated. Just the constant struct containing pointers to filename and error message and the line number. This could be optimized by generating pairs of (error message, file name) so that the thing that is duplicated only contains one pointer instead of two. We could also store everything in a big array and store 32 bit indices instead of full pointers to further reduce size.

sophiajt pushed a commit to sophiajt/rust that referenced this issue Nov 2, 2016
…r=nikomatsakis

ICH: Hash expression spans if their source location is captured for panics.

Since the location of some expressions is captured in error message constants, it has an influence on machine code and consequently we need to take them into account by the incr. comp. hash. This PR makes this happen for `+, -, *, /, %` and for array indexing -- let me know if I forgot anything.

In the future we might want to change the codegen strategy for those error messages, so that they are stored in a separate object file with a stable symbol name, so that only this object file has to be regenerated when source locations change. This strategy would also eliminate unnecessary duplications due  to monomorphization, as @arielb1 has pointed out on IRC. I opened rust-lang#37512, so we don't forget about this.

r? @nikomatsakis
@nox
Copy link
Contributor

nox commented Apr 2, 2018

This may be worth investigating for code size improvements.

Cc @rust-lang/wg-codegen

@jonas-schievink jonas-schievink added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jan 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-incr-comp Area: Incremental compilation C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compiletime Issue: Problems and improvements with respect to compile times. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants