Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable metadata hashes across workspaces #3611

Merged
merged 3 commits into from
Aug 9, 2017

Conversation

tylerwhall
Copy link
Contributor

Currently a crate from a path source will have its metadata hash incorporate its absolute path on the system where it is built. This always impacts top-level crates, which means that compiling the same source with the same dependencies and compiler version will generate libraries with symbol names that vary depending on where the workspace resides on the machine.

This is hopefully a general solution to the hack we've used in meta-rust to make dynamic linking reliable.
meta-rust/meta-rust@0e6cf94

For paths inside the Cargo workspace, hash their SourceId relative to the root of the workspace. Paths outside of a workspace are still hashed as absolute.

This stability is important for reproducible builds as part of a larger build system that employs caching of artifacts, such as OpenEmbedded.

OpenEmbedded tightly controls all inputs to a build and its caching assumes that an equivalent artifact will always result from the same set of inputs. The workspace path is not considered to be an influential input, however.

For example, if Cargo is used to compile libstd shared objects which downstream crates link to dynamically, it must be possible to rebuild libstd given the same inputs and produce a library that is at least link-compatible with the original. If the build system happens to cache the downstream crates but needs to rebuild libstd and the user happens to be building in a different workspace path, currently Cargo will generate a library incompatible with the original and the downstream executables will fail at runtime on the target.

@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@tylerwhall
Copy link
Contributor Author

It seems the appveyor failure is intermittent.
https://ci.appveyor.com/project/tylerwhall/cargo/build/1.0.3

@bors
Copy link
Collaborator

bors commented Feb 2, 2017

☔ The latest upstream changes (presumably #3609) made this pull request unmergeable. Please resolve the merge conflicts.

@alexcrichton
Copy link
Member

Thanks for the PR! I think that this would also solve #3273 as well, right?

My only worry here would be that we forget to use the stable hash and instead accidentally use the Hash impl itself, but if we've got good test coverage then that shouldn't be a problem.

@alexcrichton
Copy link
Member

Patch looks great to me! Could you add new test cases though instead of extending existing ones? Just to be explicit about what's being tested.

@tylerwhall
Copy link
Contributor Author

Could you add new test cases though instead of extending existing ones?

I had to modify custom_target_no_rebuild() because it was effectively building the same crate in two different workspace-relative locations and expecting to see no rebuild. I tried to keep the same spirit of the test while making it work with the new hashes.
same_metadata_different_directory() is a new test for the metadata stability. Should I create a new test file or something else?

As for #3273, I think this may be part of the solution but as of now it will still rebuild when moving a workspace. The generated files all have the same names now, but the .fingerprint directory still has absolute path references to the old directory.

Thanks for the review!

@alexcrichton
Copy link
Member

Oh right yeah that makes sense, we can just leave #3273 open. Changing that test case though does worry me. This breaks any cases like that which are sharing directories with CARGO_TARGET_DIR and not expecting recompiles, right?

Could we perhaps only apply this change for workspace crates?

@tylerwhall
Copy link
Contributor Author

Yeah, it will recompile if you individually build a dependency and then build a dependent crate, sharing the same CARGO_TARGET_DIR without using an explicit workspace. That's already not guaranteed though, since building the source as a dependency could cause different features and other dependency versions to be resolved. Consistency in that use case seems like what workspaces were meant to solve.

The main case I'm interested in is the top-level crate which may be at its own implicit workspace root. I don't think we should require setting a workspace to get stable output out of a single crate.

I think this workspace-relative hashing will be required as part of #3273 anyway, since moving the project root moves both the crate source and the default target dir. This just takes care of the crate source component.

@alexcrichton
Copy link
Member

Yeah this use case was definitely what workspaces were intended to solve, but I worry about projects and/or systems which may not have had a chance to update yet.

I'm not really sure of the best way to gauge what sort of breakage this would cause though. We could try landing it for awhile and see if anyone complains, but that still carries some risk with it.

We could also try posting on internals and ask around if anyone's still relying on this to gauge.

@alexcrichton
Copy link
Member

cc rust-lang/rust#39786, I think this PR would solve that

@bors
Copy link
Collaborator

bors commented Mar 12, 2017

☔ The latest upstream changes (presumably #3818) made this pull request unmergeable. Please resolve the merge conflicts.

@semarie
Copy link
Contributor

semarie commented Jun 25, 2017

@tylerwhall any update about this PR ?

Switching to workspace-relative metadata hashing means that output can
change based on what is considered to be the top-level crate. This test
is affected by this change, but in general hashes should be more stable
than before.

This test verifies that a crate will not rebuild when built as a
dependency of two different crates but with the same CARGO_TARGET_DIR.
This currently relies on the absolute path to the crate being the same
across runs. Set an explicit workspace so the hashes will be the same
when using workspace-relative hashing.

Setting a workspace would cause the target dir to be the same in both
invocations, defeating the intent of this test, so move the target dir
between runs to verify that the contents of the target dir are not
dependent on its path.
Currently a crate from a path source will have its metadata hash
incorporate its absolute path on the system where it is built. This
always impacts top-level crates, which means that compiling the same
source with the same dependencies and compiler version will generate
libraries with symbol names that vary depending on where the workspace
resides on the machine.

For paths inside the Cargo workspace, hash their SourceId relative to
the root of the workspace. Paths outside of a workspace are still hashed
as absolute.

This stability is important for reproducible builds as part of a larger
build system that employs caching of artifacts, such as OpenEmbedded.
OpenEmbedded tightly controls all inputs to a build and its caching
assumes that an equivalent artifact will always result from the same set
of inputs. The workspace path is not considered to be an influential
input, however.

For example, if Cargo is used to compile libstd shared objects which
downstream crates link to dynamically, it must be possible to rebuild
libstd given the same inputs and produce a library that is at least
link-compatible with the original. If the build system happens to cache
the downstream crates but needs to rebuild libstd and the user happens
to be building in a different workspace path, currently Cargo will
generate a library incompatible with the original and the downstream
executables will fail at runtime on the target.
@tylerwhall
Copy link
Contributor Author

tylerwhall commented Jun 26, 2017

I rebased the PR.

@alexcrichton, I think we should try to move forward with this slight change in behavior. I don't know how to fix #3273, rust-lang/rust#39786 without removing the absolute path from the hash.

@bors
Copy link
Collaborator

bors commented Jun 30, 2017

☔ The latest upstream changes (presumably #4214) made this pull request unmergeable. Please resolve the merge conflicts.

@semarie
Copy link
Contributor

semarie commented Jun 30, 2017

the travis fails was a timeout on Darwin, and now there is another rebase needed :-/

@alexcrichton
Copy link
Member

Egad I'm sorry I let this sit so long @tylerwhall! The recent changes look good to me, though, so let's land this and see how it goes.

@bors: r+

@bors
Copy link
Collaborator

bors commented Aug 9, 2017

📌 Commit eb0e4c0 has been approved by alexcrichton

@bors
Copy link
Collaborator

bors commented Aug 9, 2017

⌛ Testing commit eb0e4c0 with merge d2e8587...

bors added a commit that referenced this pull request Aug 9, 2017
Stable metadata hashes across workspaces

Currently a crate from a path source will have its metadata hash incorporate its absolute path on the system where it is built. This always impacts top-level crates, which means that compiling the same source with the same dependencies and compiler version will generate libraries with symbol names that vary depending on where the workspace resides on the machine.

This is hopefully a general solution to the hack we've used in meta-rust to make dynamic linking reliable.
meta-rust/meta-rust@0e6cf94

For paths inside the Cargo workspace, hash their SourceId relative to the root of the workspace. Paths outside of a workspace are still hashed as absolute.

This stability is important for reproducible builds as part of a larger build system that employs caching of artifacts, such as OpenEmbedded.

OpenEmbedded tightly controls all inputs to a build and its caching assumes that an equivalent artifact will always result from the same set of inputs. The workspace path is not considered to be an influential input, however.

For example, if Cargo is used to compile libstd shared objects which downstream crates link to dynamically, it must be possible to rebuild libstd given the same inputs and produce a library that is at least link-compatible with the original. If the build system happens to cache the downstream crates but needs to rebuild libstd and the user happens to be building in a different workspace path, currently Cargo will generate a library incompatible with the original and the downstream executables will fail at runtime on the target.
@bors
Copy link
Collaborator

bors commented Aug 9, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing d2e8587 to master...

@bors bors merged commit eb0e4c0 into rust-lang:master Aug 9, 2017
@ehuss ehuss added this to the 1.21.0 milestone Feb 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants