combine checks into a single job #4

ru-fu · 2023-09-28T09:29:37Z

Every job takes up a slot on a runner - which means more load on runners, and also more queuing time (for everyone). Combining the three jobs into one job with steps for each decreases the load, and it should also speed things up overall since we're checking out the repo only once.

ru-fu · 2023-09-28T09:30:46Z

Any idea why these checks don't run on this repo?
How do I test them?

tomponline · 2023-09-28T09:32:30Z

@ru-fu you need to tweak the on part, e.g.

https://github.com/canonical/lxd/blob/main/.github/workflows/tests.yml#L2-L4

ru-fu · 2023-09-28T10:10:57Z

@ru-fu you need to tweak the on part, e.g.

Couldn't get that to work. :(

But I tested with another repo now:
Working with good input
Failing as planned

tomponline

Thanks!

tomponline

thanks!

ColmBhandal

I think this is a good idea. The only thing though that I'd ask to change if possible would be to ensure that earlier failures don't prevent later checks from running. Since the checks are now sequential, this is happening. In the failure case, a failed spelling check prevents link and woke from running, which is a pity because it's nicer to see all applicable errors at once.

Two ideas for solutions - not sure which are actually possible:

If there was somehow a way of telling the GitHub job to try subsequent steps even if the earlier one failed - but still fail the entire job if any step fails. Maybe by recording step outputs and failing if any of them are bad at the end.
If there was a way to parallelize running steps within a single job

tomponline · 2023-09-28T11:40:30Z

It is nice in theory to have parallel jobs, unfortunately due to the wait times to get workers on GH runners (at least in the Canonical org anyway) it can mean 10s of minutes or even hours (like yesterday 3.5 hours) extra wait time.

So this is an efficiency change to reduce resource consumption (by repetitive building of the same source to run different checks) and wait times to get access to workers in the first place.

ru-fu · 2023-09-28T11:44:10Z

I think you can force steps to always run, independent of previous results. Let's test.

ru-fu · 2023-09-28T11:50:46Z

This works now: https://github.com/canonical/microcloud/actions/runs/6338395104/job/17215408285?pr=171

It only raises the first error though, but the other checks still run, so you can check them to see if they complete or not.

ColmBhandal · 2023-09-28T12:23:49Z

This works now: https://github.com/canonical/microcloud/actions/runs/6338395104/job/17215408285?pr=171

It only raises the first error though, but the other checks still run, so you can check them to see if they complete or not.

It's an improvement but not ideal that we can't see error icons on subsequent steps.

I have a suggestion, can you try this at the job level:

continue-on-error: true

See GitHub docs.

ColmBhandal

Instead of if: success() || failure() please try continue-on-error: true at the job level

ru-fu · 2023-09-28T12:28:40Z

Instead of if: success() || failure() please try continue-on-error: true at the job level

This won't work, because the full job will then return a success instead of a failure.
See https://stackoverflow.com/a/70048004

ColmBhandal · 2023-09-28T12:41:35Z

Instead of if: success() || failure() please try continue-on-error: true at the job level

This won't work, because the full job will then return a success instead of a failure. See https://stackoverflow.com/a/70048004

It may work or it may not, I'm not sure. I'd suggest you try it as it would be cleaner if it did work. The behaviour may have changed - see https://stackoverflow.com/a/76877377.

ru-fu · 2023-09-28T12:53:28Z

I have tried this before. It has not changed. Here's the proof: https://github.com/canonical/microcloud/actions/runs/6339134573/job/17217641161?pr=171

Every job takes up a slot on a runner - which means more load on runners, and also more queuing time (for everyone). Combining the three jobs into one job with steps for each decreases the load, and it should also speed things up overall since we're checking out the repo only once. Signed-off-by: Ruth Fuchss <ruth.fuchss@canonical.com>

ColmBhandal · 2023-09-28T13:02:27Z

I have tried this before. It has not changed. Here's the proof: https://github.com/canonical/microcloud/actions/runs/6339134573/job/17217641161?pr=171

Looking at this diff, I wonder did you add those things at the step level, rather than the job level? Did you try adding the setting at the job level?

ru-fu · 2023-09-28T13:07:22Z

I have added them on step level.
We only have one job. Adding continue-on-error on job level would give us both problems at the same time - the job will stop after the first failed step, but we'll get a success back.

ColmBhandal · 2023-09-28T13:20:42Z

I have added them on step level. We only have one job. Adding continue-on-error on job level would give us both problems at the same time - the job will stop after the first failed step, but we'll get a success back.

That may be true. I'd like to try it though because it's unclear to me if that's the beahviour e.g. according to this: https://stackoverflow.com/a/73357322:

this will continue all steps, if either fails:

ColmBhandal · 2023-09-28T13:36:01Z

Another idea that may work and would make it clearer which steps failed would be to continue on error per step, and then add a step at the end of the job aggregating which prior steps failed. This way, we don't get the confusing display of the first step failing but not others, and with aggregation we should be able to print out which steps failed, so that the user can check those.

Something like this:

name: Automatic documentation checks

on:
  workflow_call:
    inputs:
      working-directory:
        description: 'Working directory'
        required: true
        type: string
    outputs:
      spellcheck-result:
        description: "Result of the spelling check"
        value: ${{ jobs.docchecks.outputs.result_spelling }}
      woke-result:
        description: "Result of the inclusive language check"
        value: ${{ jobs.docchecks.outputs.result_woke }}
      linkcheck-result:
        description: "Result of the link check"
        value: ${{ jobs.docchecks.outputs.result_links }}

jobs:
  docchecks:
    name: Run documentation checks
    runs-on: ubuntu-latest
    outputs:
      result_spelling: ${{ steps.spellcheck-step.outcome }}
      result_woke: ${{ steps.woke-step.outcome }}
      result_links: ${{ steps.linkcheck-step.outcome }}
    steps:
      - name: Checkout
        uses: actions/checkout@v3

      - name: Spell Check
        id: spellcheck-step
        uses: canonical/documentation-workflows/spellcheck@main
        with:
          working-directory: ${{ inputs.working-directory }}
        continue-on-error: true

      - name: Inclusive Language Check
        id: woke-step
        uses: canonical/documentation-workflows/inclusive-language@main
        with:
          working-directory: ${{ inputs.working-directory }}
        continue-on-error: true

      - name: Link Check
        id: linkcheck-step
        uses: canonical/documentation-workflows/linkcheck@main
        with:
          working-directory: ${{ inputs.working-directory }}
        continue-on-error: true

      - name: Check Step Results
        run: |
          errors=()
          if [ ${{ steps.spellcheck-step.outcome }} == 'failure' ]; then
            echo "Spell Check failed"
            errors+=("Spell Check")
          fi
          if [ ${{ steps.woke-step.outcome }} == 'failure' ]; then
            echo "Inclusive Language Check failed"
            errors+=("Inclusive Language Check")
          fi
          if [ ${{ steps.linkcheck-step.outcome }} == 'failure' ]; then
            echo "Link Check failed"
            errors+=("Link Check")
          fi

          if [ ${#errors[@]} -gt 0 ]; then
            echo "Failing the job due to errors in steps: ${errors[@]}"
            exit 1
          fi

ru-fu · 2023-09-28T15:40:08Z

Can we just get this in and improve on it later?
The main problem I want to solve is to avoid jobs to run forever, and this is what this PR does. It also gives the possibility to see if any of the other steps are failing as well.

I'm sure there's ways to tune it to make it even better, but I don't really have time to spend hours on this at the moment to find the perfect solution.

ColmBhandal

We've traded off clarity of error reporting for resource / speed efficiency. I trust that this practical trade-off makes sense, so I'll approve, but it's not a nice report to read now. I've added an issue to slightly improve the error reporting (although it still won't be as good as the parallel case): #5.

ru-fu requested review from ColmBhandal and tomponline September 28, 2023 09:30

ru-fu force-pushed the combine-workflows branch 6 times, most recently from cd59141 to 5f0b31f Compare September 28, 2023 09:56

tomponline approved these changes Sep 28, 2023

View reviewed changes

ColmBhandal reviewed Sep 28, 2023

View reviewed changes

ru-fu force-pushed the combine-workflows branch from 5f0b31f to 815ccc2 Compare September 28, 2023 11:42

ru-fu force-pushed the combine-workflows branch from 815ccc2 to 4f48723 Compare September 28, 2023 11:47

tomponline approved these changes Sep 28, 2023

View reviewed changes

ColmBhandal suggested changes Sep 28, 2023

View reviewed changes

ru-fu force-pushed the combine-workflows branch from 4f48723 to 083d2d8 Compare September 28, 2023 12:51

ru-fu force-pushed the combine-workflows branch from 083d2d8 to f793621 Compare September 28, 2023 12:54

ColmBhandal mentioned this pull request Sep 28, 2023

Improve Failure Reporting #5

Closed

ColmBhandal approved these changes Sep 28, 2023

View reviewed changes

ru-fu merged commit f0131b1 into canonical:main Sep 28, 2023

ru-fu deleted the combine-workflows branch September 28, 2023 16:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

combine checks into a single job #4

combine checks into a single job #4

ru-fu commented Sep 28, 2023

ru-fu commented Sep 28, 2023

tomponline commented Sep 28, 2023

ru-fu commented Sep 28, 2023

tomponline left a comment

tomponline left a comment

ColmBhandal left a comment

tomponline commented Sep 28, 2023 •

edited

Loading

ru-fu commented Sep 28, 2023

ru-fu commented Sep 28, 2023

ColmBhandal commented Sep 28, 2023

ColmBhandal left a comment

ru-fu commented Sep 28, 2023

ColmBhandal commented Sep 28, 2023

ru-fu commented Sep 28, 2023

ColmBhandal commented Sep 28, 2023

ru-fu commented Sep 28, 2023

ColmBhandal commented Sep 28, 2023

ColmBhandal commented Sep 28, 2023

ru-fu commented Sep 28, 2023

ColmBhandal left a comment

combine checks into a single job #4

combine checks into a single job #4

Conversation

ru-fu commented Sep 28, 2023

ru-fu commented Sep 28, 2023

tomponline commented Sep 28, 2023

ru-fu commented Sep 28, 2023

tomponline left a comment

Choose a reason for hiding this comment

tomponline left a comment

Choose a reason for hiding this comment

ColmBhandal left a comment

Choose a reason for hiding this comment

tomponline commented Sep 28, 2023 • edited Loading

ru-fu commented Sep 28, 2023

ru-fu commented Sep 28, 2023

ColmBhandal commented Sep 28, 2023

ColmBhandal left a comment

Choose a reason for hiding this comment

ru-fu commented Sep 28, 2023

ColmBhandal commented Sep 28, 2023

ru-fu commented Sep 28, 2023

ColmBhandal commented Sep 28, 2023

ru-fu commented Sep 28, 2023

ColmBhandal commented Sep 28, 2023

ColmBhandal commented Sep 28, 2023

ru-fu commented Sep 28, 2023

ColmBhandal left a comment

Choose a reason for hiding this comment

tomponline commented Sep 28, 2023 •

edited

Loading