Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Use correct commit hash as github cache key if submodule is not checked out #107

Merged
merged 2 commits into from
Apr 7, 2024

Conversation

I-Al-Istannen
Copy link
Collaborator

Previously, the CI workflow file used git -C src/tools/enzyme to try and execute a rev-parse inside the submodule. Unfortunately, this only works if that submodule is actually checked out and enzyme does not specify that. I am not quite sure where they are checked out at all, but clearly they are at some point.
The commit hash chosen for the key #86 (comment) here is neither the new nor the old enzyme commit, so this evidently didn't quite work out.

This PR changes the lookup to refer to the submodule instead, which should work regardless of its status on disk.

@ZuseZ4
Copy link
Member

ZuseZ4 commented Apr 3, 2024

Rust (x.py) will check out submodules when building rustc - at least iirc.

Per discussion, postponed till the weekend.

@@ -28,7 +28,7 @@ jobs:
- uses: dtolnay/rust-toolchain@nightly
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for identifying the problem. Would a better solution not be to guarantee that the submodules are checked out as part of the checkout action?

Suggested change
- uses: dtolnay/rust-toolchain@nightly
submodules: true
- uses: dtolnay/rust-toolchain@nightly

Copy link
Collaborator Author

@I-Al-Istannen I-Al-Istannen Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about this a bit more in Discord yesterday and yes, currently your git submodules are out of sync with their state in git. This PR will therefore produce an out of sync cache — the initial checkout should recurse into submodules.

We decided to postpone that change (and merging this PR) until after deadlines and papers are met, as it has the potential to cause a CI failure :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We came to basically the same conclusion yesterday you also came to now. I will reproduce the gist of the messages here, to also have them in GitHub.


The checkout action seems to not necessarily purge old state when it runs. Therefore, old versions of your source tree can still linger on the runner machines. In the case of Enzyme-Rust an old version of enzyme was checked out into src/tools/enzyme at some point (Ref: 7e3e90f4287068a41d3fb5a99127ee2857353b04). Normally the old versions are cleaned up by the checkout action which resets the state to whatever the user/pipeline specified. Here, however, submodules are skipped in the yaml action configuration and therefore the old submodules are never updated.

Consequently, the CI pipeline currently builds with random, existing versions of submodules that do not reflect the state in git. This is quite bad IMHO.

The cache is currently consistent with the state of the runner file tree, so GitHub will restore an old cache but the name of the cache at least accurately describes its contents. With this PR, the cache will be tagged correctly according to git, but as your local state is out of sync with git, this won't really improve things.


To fix this I will try to check out all submodules first and see if that is fast enough in the CI (llvm is a bit hefty, I am not sure how nicely the checkout action optimizes this with existing local state). This is the ideal solution as git, the cache and the local file system will all agree on a single version. If it is too slow we need to be a bit more granular.

Copy link

@jedbrown jedbrown Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I recall seeing x build check out the submodule automatically so I suspect we are currently building with the correct submodule, but it's tagged with a stale hash (that might even be from the parent repo -- we've seen at least one case where the llvm and enzyme caches use the same hash).

This will also initialize all the "doc" submodules, which might prove
too much.
@ZuseZ4 ZuseZ4 merged commit af4d766 into EnzymeAD:master Apr 7, 2024
9 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants