Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

combine checks into a single job #4

Merged
merged 1 commit into from
Sep 28, 2023
Merged

Conversation

ru-fu
Copy link
Contributor

@ru-fu ru-fu commented Sep 28, 2023

Every job takes up a slot on a runner - which means more load on runners, and also more queuing time (for everyone). Combining the three jobs into one job with steps for each decreases the load, and it should also speed things up overall since we're checking out the repo only once.

@ru-fu
Copy link
Contributor Author

ru-fu commented Sep 28, 2023

Any idea why these checks don't run on this repo?
How do I test them?

@tomponline
Copy link
Member

@ru-fu ru-fu force-pushed the combine-workflows branch 6 times, most recently from cd59141 to 5f0b31f Compare September 28, 2023 09:56
@ru-fu
Copy link
Contributor Author

ru-fu commented Sep 28, 2023

@ru-fu you need to tweak the on part, e.g.

Couldn't get that to work. :(

But I tested with another repo now:
Working with good input
Failing as planned

Copy link
Member

@tomponline tomponline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@tomponline tomponline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

Copy link
Contributor

@ColmBhandal ColmBhandal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good idea. The only thing though that I'd ask to change if possible would be to ensure that earlier failures don't prevent later checks from running. Since the checks are now sequential, this is happening. In the failure case, a failed spelling check prevents link and woke from running, which is a pity because it's nicer to see all applicable errors at once.

Two ideas for solutions - not sure which are actually possible:

  1. If there was somehow a way of telling the GitHub job to try subsequent steps even if the earlier one failed - but still fail the entire job if any step fails. Maybe by recording step outputs and failing if any of them are bad at the end.
  2. If there was a way to parallelize running steps within a single job

@tomponline
Copy link
Member

tomponline commented Sep 28, 2023

It is nice in theory to have parallel jobs, unfortunately due to the wait times to get workers on GH runners (at least in the Canonical org anyway) it can mean 10s of minutes or even hours (like yesterday 3.5 hours) extra wait time.

So this is an efficiency change to reduce resource consumption (by repetitive building of the same source to run different checks) and wait times to get access to workers in the first place.

@ru-fu
Copy link
Contributor Author

ru-fu commented Sep 28, 2023

I think you can force steps to always run, independent of previous results. Let's test.

@ru-fu
Copy link
Contributor Author

ru-fu commented Sep 28, 2023

This works now: https://github.com/canonical/microcloud/actions/runs/6338395104/job/17215408285?pr=171

It only raises the first error though, but the other checks still run, so you can check them to see if they complete or not.

@ColmBhandal
Copy link
Contributor

This works now: https://github.com/canonical/microcloud/actions/runs/6338395104/job/17215408285?pr=171

It only raises the first error though, but the other checks still run, so you can check them to see if they complete or not.

It's an improvement but not ideal that we can't see error icons on subsequent steps.

I have a suggestion, can you try this at the job level:

continue-on-error: true

See GitHub docs.

Copy link
Contributor

@ColmBhandal ColmBhandal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of if: success() || failure() please try continue-on-error: true at the job level

@ru-fu
Copy link
Contributor Author

ru-fu commented Sep 28, 2023

Instead of if: success() || failure() please try continue-on-error: true at the job level

This won't work, because the full job will then return a success instead of a failure.
See https://stackoverflow.com/a/70048004

@ColmBhandal
Copy link
Contributor

Instead of if: success() || failure() please try continue-on-error: true at the job level

This won't work, because the full job will then return a success instead of a failure. See https://stackoverflow.com/a/70048004

It may work or it may not, I'm not sure. I'd suggest you try it as it would be cleaner if it did work. The behaviour may have changed - see https://stackoverflow.com/a/76877377.

@ru-fu
Copy link
Contributor Author

ru-fu commented Sep 28, 2023

I have tried this before. It has not changed. Here's the proof: https://github.com/canonical/microcloud/actions/runs/6339134573/job/17217641161?pr=171

Every job takes up a slot on a runner - which means more load on
runners, and also more queuing time (for everyone).
Combining the three jobs into one job with steps for each decreases
the load, and it should also speed things up overall since we're
checking out the repo only once.

Signed-off-by: Ruth Fuchss <ruth.fuchss@canonical.com>
@ColmBhandal
Copy link
Contributor

I have tried this before. It has not changed. Here's the proof: https://github.com/canonical/microcloud/actions/runs/6339134573/job/17217641161?pr=171

Looking at this diff, I wonder did you add those things at the step level, rather than the job level? Did you try adding the setting at the job level?

@ru-fu
Copy link
Contributor Author

ru-fu commented Sep 28, 2023

I have added them on step level.
We only have one job. Adding continue-on-error on job level would give us both problems at the same time - the job will stop after the first failed step, but we'll get a success back.

@ColmBhandal
Copy link
Contributor

I have added them on step level. We only have one job. Adding continue-on-error on job level would give us both problems at the same time - the job will stop after the first failed step, but we'll get a success back.

That may be true. I'd like to try it though because it's unclear to me if that's the beahviour e.g. according to this: https://stackoverflow.com/a/73357322:

this will continue all steps, if either fails:

@ColmBhandal
Copy link
Contributor

Another idea that may work and would make it clearer which steps failed would be to continue on error per step, and then add a step at the end of the job aggregating which prior steps failed. This way, we don't get the confusing display of the first step failing but not others, and with aggregation we should be able to print out which steps failed, so that the user can check those.

Something like this:

name: Automatic documentation checks

on:
  workflow_call:
    inputs:
      working-directory:
        description: 'Working directory'
        required: true
        type: string
    outputs:
      spellcheck-result:
        description: "Result of the spelling check"
        value: ${{ jobs.docchecks.outputs.result_spelling }}
      woke-result:
        description: "Result of the inclusive language check"
        value: ${{ jobs.docchecks.outputs.result_woke }}
      linkcheck-result:
        description: "Result of the link check"
        value: ${{ jobs.docchecks.outputs.result_links }}

jobs:
  docchecks:
    name: Run documentation checks
    runs-on: ubuntu-latest
    outputs:
      result_spelling: ${{ steps.spellcheck-step.outcome }}
      result_woke: ${{ steps.woke-step.outcome }}
      result_links: ${{ steps.linkcheck-step.outcome }}
    steps:
      - name: Checkout
        uses: actions/checkout@v3

      - name: Spell Check
        id: spellcheck-step
        uses: canonical/documentation-workflows/spellcheck@main
        with:
          working-directory: ${{ inputs.working-directory }}
        continue-on-error: true

      - name: Inclusive Language Check
        id: woke-step
        uses: canonical/documentation-workflows/inclusive-language@main
        with:
          working-directory: ${{ inputs.working-directory }}
        continue-on-error: true

      - name: Link Check
        id: linkcheck-step
        uses: canonical/documentation-workflows/linkcheck@main
        with:
          working-directory: ${{ inputs.working-directory }}
        continue-on-error: true

      - name: Check Step Results
        run: |
          errors=()
          if [ ${{ steps.spellcheck-step.outcome }} == 'failure' ]; then
            echo "Spell Check failed"
            errors+=("Spell Check")
          fi
          if [ ${{ steps.woke-step.outcome }} == 'failure' ]; then
            echo "Inclusive Language Check failed"
            errors+=("Inclusive Language Check")
          fi
          if [ ${{ steps.linkcheck-step.outcome }} == 'failure' ]; then
            echo "Link Check failed"
            errors+=("Link Check")
          fi

          if [ ${#errors[@]} -gt 0 ]; then
            echo "Failing the job due to errors in steps: ${errors[@]}"
            exit 1
          fi

@ru-fu
Copy link
Contributor Author

ru-fu commented Sep 28, 2023

Can we just get this in and improve on it later?
The main problem I want to solve is to avoid jobs to run forever, and this is what this PR does. It also gives the possibility to see if any of the other steps are failing as well.

I'm sure there's ways to tune it to make it even better, but I don't really have time to spend hours on this at the moment to find the perfect solution.

Copy link
Contributor

@ColmBhandal ColmBhandal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've traded off clarity of error reporting for resource / speed efficiency. I trust that this practical trade-off makes sense, so I'll approve, but it's not a nice report to read now. I've added an issue to slightly improve the error reporting (although it still won't be as good as the parallel case): #5.

@ru-fu ru-fu merged commit f0131b1 into canonical:main Sep 28, 2023
@ru-fu ru-fu deleted the combine-workflows branch September 28, 2023 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants