Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using random keys for incr. comp. hashing #129272

Open
michaelwoerister opened this issue Aug 19, 2024 · 4 comments
Open

Consider using random keys for incr. comp. hashing #129272

michaelwoerister opened this issue Aug 19, 2024 · 4 comments
Labels
A-incr-comp Area: Incremental compilation A-reproducibility Area: Reproducible / Deterministic builds C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@michaelwoerister
Copy link
Member

There's been recent discussion about the problems of using unkeyed SipHash128 in the compiler and if that could be exploited by an attacker.

With respect to incremental compilation, it would be possible to generate random keys and cache them together with the dep-graph. These keys could then affect query result fingerprints and dep-node identifiers. Any new from-scratch compilation session would generate new keys, so finding stable collisions should be impossible.

The only downside is that it would be hard to reproduce an actual collision if we ever found one because the keys have to be known for that. However, reproducing collisions that are due to faulty HashStable impls (which is the much more likely case) should be reproducible independent of the keys being used.

@michaelwoerister michaelwoerister added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-incr-comp Area: Incremental compilation labels Aug 19, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 19, 2024
@Mark-Simulacrum
Copy link
Member

We should also consider the perf stability - my guess is that this would hurt our reproducibility there. Maybe we can have a -Z or environment variable opt out.

@michaelwoerister
Copy link
Member Author

Yes, we'll want to have a way to explicitly set the keys in any case. I imagine that (for example) unstable fingerprint ICE messages would print the keys and how to invoke the compiler for reproducing.

@jieyouxu jieyouxu added the A-reproducibility Area: Reproducible / Deterministic builds label Aug 19, 2024
@saethlin saethlin removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 20, 2024
@briansmith
Copy link
Contributor

The only downside is [...]

Besides that downside, other downsides:

  • Every process would need to read the key from disk at least once, which costs at least an extra syscall for the read.
  • We'd have a secret key. Now we are doing key storage and key management. Usually this ends up being a lot more work than expected. We'd forever be bothered by people filing security bug reports saying we're not protecting the key good enough.
  • Any future attempt to distribute the work across multiple systems would require distributing the key or pinning shards of the inputs to the same worker systems. The key distribution approach would be complicated because you then have to have a security analysis of the security of sharing the key across the systems. My understanding is that the pinning approach tends be avoided because it results in unfair loading of systems.

@michaelwoerister
Copy link
Member Author

Thanks for the comments, @briansmith!

Every process would need to read the key from disk at least once, which costs at least an extra syscall for the read.

I'm certain that's not an issue in practice. The compiler already reads many files for incr. comp.

We'd have a secret key. Now we are doing key storage and key management. Usually this ends up being a lot more work than expected. We'd forever be bothered by people filing security bug reports saying we're not protecting the key good enough.

Yes, I can imagine that that's a problem. Maybe the solution is to not call it a "key"? It's a salt to prevent being able craft a set of two commits that compiled one after the other would reliably resulting a collision.

Any future attempt to distribute the work across multiple systems would require distributing the key or pinning shards of the inputs to the same worker systems.

The compiler's approach to incr. comp. doesn't really support this scenario to begin with. But I would imagine that a build system trying to do something like that would explicitly set a single key for all agents involved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-incr-comp Area: Incremental compilation A-reproducibility Area: Reproducible / Deterministic builds C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants