Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check performance of tset artifactvalues cache #735

Open
cjhopman opened this issue Aug 7, 2024 · 3 comments
Open

Check performance of tset artifactvalues cache #735

cjhopman opened this issue Aug 7, 2024 · 3 comments

Comments

@cjhopman
Copy link
Contributor

cjhopman commented Aug 7, 2024

There's an interesting issue in bazel here: bazelbuild/bazel#21378. There's a repro for a bad perf case of their merkle tree cache here: https://github.com/DavidANeil/repro-bazel-nested-set/tree/master. That cache seems like it likely does something very similar to how we cache the ActionSharedDirectory for tset nodes, the repro might be a good example for us to check performance on.

@cjhopman
Copy link
Contributor Author

cjhopman commented Aug 7, 2024

Other relevant bazel discussions in bazelbuild/bazel#20862 and bazelbuild/bazel#18686

@cjhopman
Copy link
Contributor Author

cjhopman commented Aug 7, 2024

Confirmed that buck2 has similar poor performance here: https://github.com/cjhopman/repro-bazel-nested-set

buck1 approach is potentially more performant here. It maintains an map of tsetnode equivalent -> interned MerkleTreeNode. and then a MerkleTreeNode lazily computes and caches its encoded form only when requested. see https://github.com/facebook/buck/blob/9c7c421e49f4d92d67321f18c6d1cd90974c77c4/src/com/facebook/buck/remoteexecution/util/MerkleTreeNodeCache.java and https://github.com/facebook/buck/blob/9c7c421e49f4d92d67321f18c6d1cd90974c77c4/src/com/facebook/buck/rules/modern/builders/ModernBuildRuleRemoteExecutionHelper.java#L531

@cjhopman cjhopman changed the title Check performance of depset artifactvalues cache Check performance of tset artifactvalues cache Aug 7, 2024
@cjhopman
Copy link
Contributor Author

cjhopman commented Aug 7, 2024

That model doesn't fit great into buck and there's a question of the cost of interning. But actually I think it could be fine to just depend on sharing based on the dice value cached for the tset already for the interning part of that.

but still, it might just be too costly to do all the tree merges themselves.

if this is something like a flat directory of items, and each item is a value in one tset and then there's a densish graph between them, producing the merged directories is going to be O(n^2). lazily doing the fingerprinting just improves the constant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant