Consider using random keys for incr. comp. hashing #129272

michaelwoerister · 2024-08-19T16:16:59Z

There's been recent discussion about the problems of using unkeyed SipHash128 in the compiler and if that could be exploited by an attacker.

With respect to incremental compilation, it would be possible to generate random keys and cache them together with the dep-graph. These keys could then affect query result fingerprints and dep-node identifiers. Any new from-scratch compilation session would generate new keys, so finding stable collisions should be impossible.

The only downside is that it would be hard to reproduce an actual collision if we ever found one because the keys have to be known for that. However, reproducing collisions that are due to faulty HashStable impls (which is the much more likely case) should be reproducible independent of the keys being used.

The text was updated successfully, but these errors were encountered:

Mark-Simulacrum · 2024-08-19T16:46:22Z

We should also consider the perf stability - my guess is that this would hurt our reproducibility there. Maybe we can have a -Z or environment variable opt out.

michaelwoerister · 2024-08-19T17:40:06Z

Yes, we'll want to have a way to explicitly set the keys in any case. I imagine that (for example) unstable fingerprint ICE messages would print the keys and how to invoke the compiler for reproducing.

briansmith · 2024-08-21T20:13:06Z

The only downside is [...]

Besides that downside, other downsides:

Every process would need to read the key from disk at least once, which costs at least an extra syscall for the read.
We'd have a secret key. Now we are doing key storage and key management. Usually this ends up being a lot more work than expected. We'd forever be bothered by people filing security bug reports saying we're not protecting the key good enough.
Any future attempt to distribute the work across multiple systems would require distributing the key or pinning shards of the inputs to the same worker systems. The key distribution approach would be complicated because you then have to have a security analysis of the security of sharing the key across the systems. My understanding is that the pinning approach tends be avoided because it results in unfair loading of systems.

michaelwoerister · 2024-08-22T07:51:21Z

Thanks for the comments, @briansmith!

Every process would need to read the key from disk at least once, which costs at least an extra syscall for the read.

I'm certain that's not an issue in practice. The compiler already reads many files for incr. comp.

We'd have a secret key. Now we are doing key storage and key management. Usually this ends up being a lot more work than expected. We'd forever be bothered by people filing security bug reports saying we're not protecting the key good enough.

Yes, I can imagine that that's a problem. Maybe the solution is to not call it a "key"? It's a salt to prevent being able craft a set of two commits that compiled one after the other would reliably resulting a collision.

Any future attempt to distribute the work across multiple systems would require distributing the key or pinning shards of the inputs to the same worker systems.

The compiler's approach to incr. comp. doesn't really support this scenario to begin with. But I would imagine that a build system trying to do something like that would explicitly set a single key for all agents involved.

michaelwoerister added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-incr-comp Area: Incremental compilation labels Aug 19, 2024

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 19, 2024

jieyouxu added the A-reproducibility Area: Reproducible / Deterministic builds label Aug 19, 2024

saethlin removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider using random keys for incr. comp. hashing #129272

Consider using random keys for incr. comp. hashing #129272

michaelwoerister commented Aug 19, 2024

Mark-Simulacrum commented Aug 19, 2024

michaelwoerister commented Aug 19, 2024

briansmith commented Aug 21, 2024

michaelwoerister commented Aug 22, 2024

Consider using random keys for incr. comp. hashing #129272

Consider using random keys for incr. comp. hashing #129272

Comments

michaelwoerister commented Aug 19, 2024

Mark-Simulacrum commented Aug 19, 2024

michaelwoerister commented Aug 19, 2024

briansmith commented Aug 21, 2024

michaelwoerister commented Aug 22, 2024