Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci(test-boot): split up test jobs via AVA recipe #9511

Merged
merged 2 commits into from
Jun 17, 2024
Merged

Conversation

michaelfig
Copy link
Member

@michaelfig michaelfig commented Jun 14, 2024

Incidental

test-boot takes too long to run as a single job (almost 30 minutes). Instead, create a new strategy matrix dimension to indicate how to split the test jobs via AVA's support for https://github.com/avajs/ava/blob/main/docs/recipes/splitting-tests-ci.md#splitting-tests-on-github-actions

Also, deduplicate and replace the older glob-based SwingSet-test splitting with the more general AVA splitting.

Before: 32m 3s
After: 15m 53s

@michaelfig michaelfig self-assigned this Jun 14, 2024
@michaelfig michaelfig added the tooling repo-wide infrastructure label Jun 14, 2024
Copy link

cloudflare-workers-and-pages bot commented Jun 14, 2024

Deploying agoric-sdk with  Cloudflare Pages  Cloudflare Pages

Latest commit: 50229bd
Status: ✅  Deploy successful!
Preview URL: https://541c2ff2.agoric-sdk.pages.dev
Branch Preview URL: https://mfig-split-test-boot.agoric-sdk.pages.dev

View logs

@michaelfig michaelfig requested a review from turadg June 14, 2024 21:50
@michaelfig michaelfig force-pushed the mfig-split-test-boot branch 2 times, most recently from b339f61 to 5a917f8 Compare June 15, 2024 03:30
@michaelfig michaelfig requested a review from mhofman June 15, 2024 03:31
@michaelfig michaelfig marked this pull request as ready for review June 15, 2024 03:31
@@ -21,7 +21,7 @@ const test = anyTest as TestFn<
const auctioneerPath = 'published.auction';

test.before(async t => {
const context = await makeLiquidationTestContext(t);
const context = await makeLiquidationTestContext(t, 'bundles/addAssets');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside of separate caches is fewer cache hits. Having one cache per test.before is the same as no cache.

Did you observe corruption? I think we can and should fix that in the caching bundler. Would you file an issue with examples of the problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I observed corruption, where one test reading a partially-built (or build-in-progress) cache cannot parse the truncated JSON files.

I eliminated the corruption by using lockfiles in endojs/endo#1544, but this had the problem that stale lockfiles would deadlock bundle-source (endojs/endo#2163).

endojs/endo#1714 was touted as a way to solve this deadlock, but it reintroduced the original corruption problem.

If we're going to fix this thoroughly while preventing bundle-source running on the same bundle cache in separate processes from stomping on one another, I think we need to go back to my lockfile implementation, and add an expiry time for the generated lockfiles.

@kriskowal ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To try to provoke the corruption I saw, I'm using a single bundleDir for all the tests. Then I'll file an issue if I can see it failing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A repro would be good. I was unaware that the lockfile was addressing a corruption issue that using an atomic posix rename upon write-completion wouldn’t solve. I’m hoping that this is just a matter of applying the atomic rename strategy harder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that atomic rename still has a race because there are two files in question: the -js-meta.json file and the .js file. I haven't convinced myself that all the possible interleavings of two separate processes updating the same bundle will work out correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure myself. We should have spaghetti with chopsticks some time. I think the trick is usually to be sure you read, write, and stat the two files in the same order when reading or writing. There might be a retry loop in some cases. Is the concurrency due to Ava? Are we writing and reading bundles from disk concurrently in tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in this case, the concurrency is due to AVA. I'm loathe to reduce AVA's concurrency since that's a bottleneck for agoric-sdk CI testing performance.

.github/workflows/test-all-packages.yml Outdated Show resolved Hide resolved
@@ -468,6 +468,11 @@ jobs:
strategy:
matrix:
engine: ['18.x', '20.x', 'xs']
testglob:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, very cool! Thanks, I will try that.

@michaelfig michaelfig changed the title ci(test-boot): split up test jobs by matrix.testglob ci(test-boot): split up boot and swingset test jobs via AVA recipe Jun 15, 2024
@michaelfig michaelfig changed the title ci(test-boot): split up boot and swingset test jobs via AVA recipe ci(test-boot): split up test jobs via AVA recipe Jun 15, 2024
@michaelfig michaelfig force-pushed the mfig-split-test-boot branch 4 times, most recently from 6a44aef to 2b81eff Compare June 16, 2024 01:02
@michaelfig michaelfig requested a review from turadg June 16, 2024 01:37
@michaelfig michaelfig added the automerge:no-update (expert!) Automatically merge without updates label Jun 17, 2024
@mergify mergify bot merged commit 5375019 into master Jun 17, 2024
85 checks passed
@mergify mergify bot deleted the mfig-split-test-boot branch June 17, 2024 19:48
mhofman pushed a commit that referenced this pull request Jun 20, 2024
_Incidental_

`test-boot` takes too long to run as a single job (almost 30 minutes).  Instead, create a new strategy matrix dimension to indicate how to split the test jobs via AVA's support for https://github.com/avajs/ava/blob/main/docs/recipes/splitting-tests-ci.md#splitting-tests-on-github-actions 

Also, deduplicate and replace the older glob-based SwingSet-test splitting with the more general AVA splitting.

Before: [32m 3s](https://github.com/Agoric/agoric-sdk/actions/runs/9530612824/usage)
After: [15m 53s](https://github.com/Agoric/agoric-sdk/actions/runs/9532147567/usage)
mhofman pushed a commit that referenced this pull request Jun 22, 2024
_Incidental_

`test-boot` takes too long to run as a single job (almost 30 minutes).  Instead, create a new strategy matrix dimension to indicate how to split the test jobs via AVA's support for https://github.com/avajs/ava/blob/main/docs/recipes/splitting-tests-ci.md#splitting-tests-on-github-actions 

Also, deduplicate and replace the older glob-based SwingSet-test splitting with the more general AVA splitting.

Before: [32m 3s](https://github.com/Agoric/agoric-sdk/actions/runs/9530612824/usage)
After: [15m 53s](https://github.com/Agoric/agoric-sdk/actions/runs/9532147567/usage)
mhofman added a commit that referenced this pull request Jun 26, 2024
Rebase todo:

```
# Branch fix-vow-make-watch-when-more-robust-against-loops-and-hangs-9487-
label base-fix-vow-make-watch-when-more-robust-against-loops-and-hangs-9487-
pick bcecf52 test(vow): add test of more vow upgrade scenarios
pick d7135b2 test: switch vow test to run under xs for metering
pick 99fccca test(vow): add test for resolving vow to external promise
pick 6d3f88c test(vow): add test for vow based infinite vat ping pong
pick c78ff0e test(vow): check vow consumers for busy loops or hangs
pick 3c63cba fix(vow): prevent loops and hangs from watching promises
pick 188c810 chore(vat-data): remove the deprecated `@agoric/vat-data/vow.js`
pick 44a6d16 fix(vow): allow resolving vow to external promise
label fix-vow-make-watch-when-more-robust-against-loops-and-hangs-9487-
reset base-fix-vow-make-watch-when-more-robust-against-loops-and-hangs-9487-
merge -C 4fca040 fix-vow-make-watch-when-more-robust-against-loops-and-hangs-9487- # fix(vow): make watch/when more robust against loops and hangs (#9487)

# Branch ci-mergify-strip-merge-commit-HTML-comments-9499-
label base-ci-mergify-strip-merge-commit-HTML-comments-9499-
pick 63e21ab ci(mergify): strip merge commit HTML comments
label ci-mergify-strip-merge-commit-HTML-comments-9499-
reset base-ci-mergify-strip-merge-commit-HTML-comments-9499-
merge -C 7b93671 ci-mergify-strip-merge-commit-HTML-comments-9499- # ci(mergify): strip merge commit HTML comments (#9499)

# Pull request #9506
pick 249a5d4 fix(SwingSet): Undo deviceTools behavioral change from #9153 (#9506)

# Pull request #9507
pick a19a964 fix(liveslots): promise watcher to cause unhandled rejection if no handler (#9507)

# Branch feat-make-vat-localchain-resumable-9488-
label base-feat-make-vat-localchain-resumable-9488-
pick 76c17c6 feat: make vat-localchain resumable
pick 40ccba1 fix(vow): correct the typing of `unwrap`
pick 90e062c fix(localchain): work around TypeScript mapped tuple bug
pick 3027adf fix(network): use new `ERef` and `FarRef`
label feat-make-vat-localchain-resumable-9488-
reset base-feat-make-vat-localchain-resumable-9488-
merge -C 5856dc0 feat-make-vat-localchain-resumable-9488- # feat: make vat-localchain resumable (#9488)

# Branch ci-mergify-clarify-queue-conditions-and-merge-conditions-9510-
label base-ci-mergify-clarify-queue-conditions-and-merge-conditions-9510-
pick 7247bd9 ci(mergify): clarify `queue_conditions` and `merge_conditions`
label ci-mergify-clarify-queue-conditions-and-merge-conditions-9510-
reset base-ci-mergify-clarify-queue-conditions-and-merge-conditions-9510-
merge -C 30e56ae ci-mergify-clarify-queue-conditions-and-merge-conditions-9510- # ci(mergify): clarify `queue_conditions` and `merge_conditions` (#9510)

# Branch fix-liveslots-cache-delete-does-not-return-a-useful-value-9509-
label base-fix-liveslots-cache-delete-does-not-return-a-useful-value-9509-
pick 42ea8a3 fix(liveslots): cache.delete() does not return a useful value
label fix-liveslots-cache-delete-does-not-return-a-useful-value-9509-
reset base-fix-liveslots-cache-delete-does-not-return-a-useful-value-9509-
merge -C a2e54e1 fix-liveslots-cache-delete-does-not-return-a-useful-value-9509- # fix(liveslots): cache.delete() does not return a useful value (#9509)

# Branch chore-swingset-re-enable-test-of-unrecognizable-orphan-cleanup-8694-
label base-chore-swingset-re-enable-test-of-unrecognizable-orphan-cleanup-8694-
pick 9930bd3 chore(swingset): re-enable test of unrecognizable orphan cleanup
label chore-swingset-re-enable-test-of-unrecognizable-orphan-cleanup-8694-
reset base-chore-swingset-re-enable-test-of-unrecognizable-orphan-cleanup-8694-
merge -C bc53ef7 chore-swingset-re-enable-test-of-unrecognizable-orphan-cleanup-8694- # chore(swingset): re-enable test of unrecognizable orphan cleanup (#8694)

# Pull request #9508
pick 513adc9 refactor(internal): move async helpers using AggregateError to node (#9508)

# Branch report-bundle-sizing-in-agoric-run-9503-
label base-report-bundle-sizing-in-agoric-run-9503-
pick 68af59c refactor: inline addRunOptions
pick a0115ed feat: writeCoreEval returns plan
pick bd0edcb feat: stat-bundle and stat-plan scripts
pick 0405202 feat: agoric run --verbose
pick 22b43da feat(stat-bundle): show CLI to explode the bundle
label report-bundle-sizing-in-agoric-run-9503-
reset base-report-bundle-sizing-in-agoric-run-9503-
merge -C 7b30169 report-bundle-sizing-in-agoric-run-9503- # report bundle sizing in agoric run (#9503)

# Branch ci-test-boot-split-up-test-jobs-via-AVA-recipe-9511-
label base-ci-test-boot-split-up-test-jobs-via-AVA-recipe-9511-
pick 5f3c1d1 test(boot): use a single bundle directory for all tests
pick 50229bd ci(all-packages): split tests according to AVA recipe
label ci-test-boot-split-up-test-jobs-via-AVA-recipe-9511-
reset base-ci-test-boot-split-up-test-jobs-via-AVA-recipe-9511-
merge -C 5375019 ci-test-boot-split-up-test-jobs-via-AVA-recipe-9511- # ci(test-boot): split up test jobs via AVA recipe (#9511)

# Pull request #9514
pick f908f89 fix: endow with original unstructured `assert` (#9514)

# Pull request #9535
pick 989aa19 fix(swingset): log vat termination and upgrade better (#9535)

# Branch types-zoe-setTestJig-param-type-optional-9533-
label base-types-zoe-setTestJig-param-type-optional-9533-
pick 426a3be types(zoe): setTestJig param type optional
label types-zoe-setTestJig-param-type-optional-9533-
reset base-types-zoe-setTestJig-param-type-optional-9533-
merge -C bf9f03b types-zoe-setTestJig-param-type-optional-9533- # types(zoe): setTestJig param type optional (#9533)

# Branch adopt-TypeScript-5-5-9476-
label base-adopt-TypeScript-5-5-9476-
pick 381b6a8 chore(deps): bump Typescript to 5.5 release
label adopt-TypeScript-5-5-9476-
reset base-adopt-TypeScript-5-5-9476-
merge -C 0cfea88 adopt-TypeScript-5-5-9476- # adopt TypeScript 5.5 (#9476)

# Branch retry-flaky-agoric-cli-integration-test-9550-
label base-retry-flaky-agoric-cli-integration-test-9550-
pick 2a68510 ci: retry agoric-cli integration test
label retry-flaky-agoric-cli-integration-test-9550-
reset base-retry-flaky-agoric-cli-integration-test-9550-
merge -C c5c52ec retry-flaky-agoric-cli-integration-test-9550- # retry flaky agoric-cli integration test (#9550)

# Pull Request #9556
pick 0af876f fix(vow): watcher args instead of context (#9556)

# Pull Request #9561
pick a4f86eb fix(vow): handle resolution loops in vows (#9561)

# Branch Restore-a3p-tests-9557-
label base-Restore-a3p-tests-9557-
pick d36382d chore(a3p): restore localchain test
pick 5ff628e Revert "test: drop or clean-up old Tests"
pick b5cf8bd fix(localchain): `callWhen`s return `PromiseVow`
label Restore-a3p-tests-9557-
reset base-Restore-a3p-tests-9557-
merge -C c65915e Restore-a3p-tests-9557- # Restore a3p tests (#9557)

# Pull Request #9559
pick 6073b2b fix(agoric): convey tx opts to `agoric wallet` and subcommands (#9559)

# Branch explicit-heapVowTools-9548-
label base-explicit-heapVowTools-9548-
pick 100de68 feat!: export heapVowTools
pick 8cb1ee7 refactor: use heapVowTools import
pick 0ac6774 docs: vow vat utils
pick 9128f27 feat: export heapVowE
pick 3b0c8c1 refactor: use heapVowE
pick 9b84bfa BREAKING CHANGE: drop V export
pick 6623af5 chore(types): concessions to prepack
label explicit-heapVowTools-9548-
reset base-explicit-heapVowTools-9548-
merge -C 4440ce1 explicit-heapVowTools-9548- # explicit heapVowTools (#9548)

# Pull Request #9564
pick 44926a7 fix(bn-patch): fix bad html evasion (#9564)

# Branch Fix-upgrade-behaviors-9526-
label base-Fix-upgrade-behaviors-9526-
pick ef1e0a2 feat: Upgrade Zoe
pick e4cc97c Revert "fix(a3p-integration): workaround zoe issues"
pick 84dd229 feat: repair KREAd contract on zoe upgrade
pick cb77160 test: validate KREAd character purchase
pick e1d961e test: move vault upgrade from test to use phase (#9536)
label Fix-upgrade-behaviors-9526-
reset base-Fix-upgrade-behaviors-9526-
merge -C 8d05faf Fix-upgrade-behaviors-9526- # Fix upgrade behaviors (#9526)

# Branch Support-for-snapshots-export-command-9563-
label base-Support-for-snapshots-export-command-9563-
pick 2a3976e refactor(cosmos): use DefaultBaseappOptions for newApp
pick 84208e9 fix(cosmos): use dedicated dedicate app creator for non start commands
pick 8c1a62d chore(cosmos): refactor cosmos command extension
pick 4386f8e feat(cosmos): support snapshot export
pick 2dabb52 test(a3p): add test for snapshots export and restore
label Support-for-snapshots-export-command-9563-
reset base-Support-for-snapshots-export-command-9563-
merge -C 309c7e1 Support-for-snapshots-export-command-9563- # Support for `snapshots export` command (#9563)

# Branch Swing-store-export-data-outside-of-genesis-file-9549-
label base-Swing-store-export-data-outside-of-genesis-file-9549-
pick f1eacbe fix(x/swingset): handle defer errors on export write
pick 496a430 feat(cosmos): add hooking kv reader
pick f476bd5 feat(cosmos): separate swing-store export data from genesis file
pick 17a5374 test(a3p): add genesis fork acceptance test
label Swing-store-export-data-outside-of-genesis-file-9549-
reset base-Swing-store-export-data-outside-of-genesis-file-9549-
merge -C 3aa5d66 Swing-store-export-data-outside-of-genesis-file-9549- # Swing-store export data outside of genesis file (#9549)

# Branch Remove-scaled-price-authority-upgrade-9585-
label base-Remove-scaled-price-authority-upgrade-9585-
pick bce49e3 test: add test during upgradeVaults; vaults detect new prices
pick 88f6500 test: repair test by dropping upgrade of scaledPriceAuthorities
label Remove-scaled-price-authority-upgrade-9585-
reset base-Remove-scaled-price-authority-upgrade-9585-
merge -C 8376991 Remove-scaled-price-authority-upgrade-9585- # Remove scaled price authority upgrade (#9585)

# Branch feat-make-software-upgrade-coreProposals-conditional-on-upgrade-plan-name-9575-
label base-feat-make-software-upgrade-coreProposals-conditional-on-upgrade-plan-name-9575-
pick 95174a2 feat(builders): non-ambient `strictPriceFeedProposalBuilder` in `priceFeedSupport.js`
pick 5cc190d feat(app): establish mechanism for adding core proposals by `upgradePlan.name`
pick 52f02b7 fix(builders): use proper `oracleBrand` subkey case
pick ddc072d chore(cosmos): extract `app/upgrade.go`
pick b3182a4 chore: fix error handling of upgrade vaults proposal
pick ea568a2 fix: retry upgrade vaults price quote
label feat-make-software-upgrade-coreProposals-conditional-on-upgrade-plan-name-9575-
reset base-feat-make-software-upgrade-coreProposals-conditional-on-upgrade-plan-name-9575-
merge -C cbe061c feat-make-software-upgrade-coreProposals-conditional-on-upgrade-plan-name-9575- # feat: make software upgrade `coreProposals` conditional on upgrade plan name (#9575)
```

This is followed by a commit to remove the `orchestration` and
`async-flow` packages from the release, as these are not baked in yet
and are not deployed anyway.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge:no-update (expert!) Automatically merge without updates tooling repo-wide infrastructure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants