-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix CI failures #938
Fix CI failures #938
Conversation
42ffe3e
to
6b21ec7
Compare
3aeba0b
to
1a54d3d
Compare
1a54d3d
to
97dc7d8
Compare
97dc7d8
to
92ac6b7
Compare
7598c0b
to
9e28ccc
Compare
94957b5
to
ae6fbe7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think the changes make sense in any case, although I'm still a little bit confused as to why CI tests started to fail, which makes me doubt whether it is due to a change in GH or a change that we introduced 🤔
@@ -217,6 +217,9 @@ func TestBrowserLogIterationID(t *testing.T) { | |||
} | |||
|
|||
func TestMultiBrowserPanic(t *testing.T) { | |||
// TODO: This test never works on CI, fix it. | |||
t.Skip("skipping, fix this") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should use an ENV var, set by us in CI, as a flag in order to skip this test or not. In that way at least we will always run this test when testing locally. Do you think that is worth it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense; I was thinking about the same, too. Especially for the disable-gpu
one. We can do it whether we're running on Github by checking the GITHUB_ACTIONS
environment variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think it would be good to get some clarification on the changes so that we're aligned 🙂
Thanks for tackling this issue!!
@@ -21,6 +21,19 @@ jobs: | |||
platform: [ubuntu-latest] | |||
runs-on: ${{ matrix.platform }} | |||
steps: | |||
- name: Setup Chrome |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting! So we were relying on Chrome already being present in the CI test VMs/Containers up until we started seeing these issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's interesting, indeed. It didn't work when I removed it.
export GOMAXPROCS=1 | ||
fi | ||
# Run with less concurrency to reduce CI flakiness. | ||
export GOMAXPROCS=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this change. I think it makes sense since when we see a flakey tests we don't have the resources to resolve them straight away, and they can be quite demanding tests to fix. At the same time, i think we've found some issues in CI that we don't see in our local test runs, which are generally race conditions or deadlocks. I'll go with the consensus on this one 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I wonder if we should have one set of tests specifically run with higher concurrency, or maybe just leave the current setup and expect it to pass in any case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the one that makes it consistently (eh, to some degree) pass. Otherwise, it constantly fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I'm testing this PR with several commits (in and out), and adding/removing some/all of the tests, I noticed, the nature of the tests doesn't matter for the outcome of the test runs. When there is concurrency, there is a failure here.
@@ -55,7 +55,7 @@ jobs: | |||
export GOMAXPROCS=1 | |||
args=("-p" "1" "-race") | |||
export K6_BROWSER_HEADLESS=true | |||
go test "${args[@]}" -timeout 5m ./... | |||
go test "${args[@]}" -timeout 10m ./... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need such a large timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the concurrency is 1 now, and it was 2 before, I thought it would make sense to double the timeout. When I tested it, tests took about 2x longer.
We're seeing this CI issue with our tagged releases too. I've tested |
Sure :) TBH, I'm as confused as you are :( I believe this issue is caused by GH infrastructure changes and not because of our changes. As I mentioned in my previous edits in the PR description, I've tried with earlier merges. Then I even tried back to v0.8.0. Nothing has changed. This PR is totally created by experimentation. So I don't have the best answers to your questions :( TLDR: I don't even know myself what/why I'm doing there 😄 I've tried to answer your questions in the comments above. |
Although this one is a fix that demands no cost, #947 seems like a more elegant fix. Closing this one. We can reopen it if needed. Update: It worked brilliantly with the addition of ubuntu-pro ($). Otherwise, we need to retry one or two times ($=free). Even with a concurrency of one, tests finish in ~5m. Note that workers (CI tasks) are still concurrent. (edited) |
I'm just curious to see what happens :-)
The solution in this PR still needs to be better, but it's better than the current workflows. When we run it multiple times, tests eventually pass. Feel free to push commits to this PR, if needed.
PS: While testing this PR, GH has automatically closed the PR multiple times 😄