Add ability to re-run single jobs #432

tvdeyen · 2020-04-17T07:42:31Z

Please add the ability to re-run single jobs of a workflow. This is such a basic feature.

Please keep the environment in mind while prioritizing features.

🙏

TingluoHuang · 2020-04-17T12:17:48Z

@chrispat from the product team for this feedback. 😄

ygj6 · 2020-12-02T01:46:09Z

Any updates?

fuesec · 2021-01-07T09:11:42Z

would love to see this feature

schw4rzlicht · 2021-03-03T10:03:22Z

Any updates on this? In a parallelized Workflow, we always spend 100 build minutes even if one of the 1-minute-jobs fails.

rathboma · 2021-03-18T20:53:14Z

💯 this is a much needed addition.

I maintain Beekeeper Studio, and random timeouts cause 1/5 jobs to fail fairly regularly. Being able to re-run only the jobs that failed would save us so much time.

I would also love to see being able to retry a single-step, and have other jobs not abort on a single failed job.

Ferroin · 2021-03-19T02:03:55Z

This is a crucial feature for anyone doing multi-arch package builds and deployments with tooling that cannot build multiple architectures in parallel. WIthout this, much more complicated workflows are required to ensure packages don’t get deployed multiple times just because one of the jobs failed.

sindrijo · 2021-03-19T14:07:34Z

This is a much needed feature for my team, which is building a CI pipeline for build/testing/deploying multiple packages for multiple platforms, not having this feature seems very wasteful in both time and electrons.

prettycoder · 2021-03-25T23:30:51Z

have to chime in. My matrix generates 61 jobs and one of them usually fails because of where it collects data from. The next time I re-run 61 jobs another one fails ...

tbarbugli · 2021-03-26T11:37:50Z

Our team is currently exploding 1 workflow into many jobs, this is a terrible hack to get retries to work because you cannot make much sense of the status of a branch/commit anymore (the Actions UI does not do any grouping at that point)

OmgImAlexis · 2021-03-27T19:38:06Z

It's been a year, @TingluoHuang is there any status update on this?

Has it been at least considered?

maragunde93 · 2021-04-13T20:22:18Z

Is there any updated on this? we have migrated from GitLab CI to GitHub actions and I am already regretting about it, we have a pipeline that deploys infrastructure and takes around 1 hour, if somethings fails retrying the whole pipeline is a big lost.

johanhelsing-attensi · 2021-05-07T11:04:15Z

I'm honestly quite disappointed that this doesn't seem to have been prioritized, and no response as to when/if it can be expected (just that it was "definitely on the backlog" back in 2019).

In order to provide some substance, and not just spam everyone with complaints and a +1, here's a summary about what I found with respect to workarounds:

https://github.51.almunity/t/re-run-jobs/16145/11

I think I found a workaround here. I have split my matrix build to multiple yaml files, each one with a different name but the same trigger. Each of them contain only a single run. As it looks, this enables me to re-run jobs individually. I do so by selecting “re-run all jobs”, where “all” is now always exactly one.

The price you need to pay is that you have some code duplication and cannot use the matrix feature. For me, this is acceptable as the CI is actually done by a script and shared code outside of that script is relatively minimal.

Unfortunately, this is a bit clunky to use, as at least last I checked neither includes nor yaml anchors are supported in workflows, so code reuse and maintainability across projects will be a pain. Also, I don't really understand how I could make rules like "deploy to staging once all builds pass".

The matrix feature is also really nice, and I'd hate to lose it.

There is another interesting hack, which stores the last run status in cache and then skips jobs based on that status:

How they used it:

    - name: Set default run status
      run: echo "::set-output name=last_run_status::default" > last_run_status

    - name: Restore last run status
      id: last_run
      uses: actions/cache@v2
      with:
        path: |
          last_run_status
        key: ${{ github.run_id }}-${{ matrix.os }}-${{ matrix.node-version }}-${{ matrix.webpack }}-${{ steps.date.outputs.date }}
        restore-keys: |
          ${{ github.run_id }}-${{ matrix.os }}-${{ matrix.node-version }}-${{ matrix.webpack }}-
    - name: Set last run status
      id: last_run_status
      run: cat last_run_status

    - name: Checkout ref
      uses: actions/checkout@v2
      with:
        ref: ${{ github.event.workflow_dispatch.ref }}

    - name: Use Node.js ${{ matrix.node-version }}
      if: steps.last_run_status.outputs.last_run_status != 'success'
      uses: actions/setup-node@v1
      with:
        node-version: ${{ matrix.node-version }}

btjones-me · 2021-05-25T15:04:12Z

+1 we would be grateful for this feature.

gergo-papp · 2021-05-25T16:37:05Z

+1 We are evaluating different CI providers right now (after potentially migrating from Travis) and I'm sure this is a really important feature for many other developers as well

Sarga · 2021-05-27T11:50:15Z

+1 we need this feature.

rr-nick-tan · 2021-05-27T15:11:54Z

looking for this feature too, otherwise, have to split the workflow into multiple ones

marcelwa · 2021-06-01T12:23:30Z

+1 please safe the planet!

domdfcoding · 2021-06-01T12:56:00Z

To avoid spamming everyone with notifications please use GitHub's reaction buttons instead of commenting "+1 we want this". Thanks 😃

Omzig · 2022-02-01T23:01:24Z

Did you know that *.visualstudio.com can do this in devops?

dylanbhughes · 2022-02-28T19:39:11Z

Hope to see it this quarter 🙏

ethomson · 2022-03-01T21:46:29Z

Hello everyone! I strongly agree that this is a thing we need - and in fact this is a thing that we're working on. However, this is a part of Actions itself, it's not a part of the runner application (meaning: the software that's in this repository).

In order to keep things tidy for the runner team - the developers who are working on this application - I'm going to close this issue where it will stay off of their bug list.

This is being tracked in our feedback repository which is where you can request features in GitHub Actions. Thanks for all the feedback, everyone, and I hope to see you in our feedback repo.

bartlettroscoe · 2022-03-09T14:55:55Z

NOTE: Avoiding rerunning jobs that have already passed is more than just saving computing cycles. It also is critical to avoid the cumulative probability of failures in the different jobs that can significantly increase the number of testing iterations needed to get all passing jobs. For example, if you have a GitHub Actions setup with seven independent jobs that run to test a PR, if there is a 20% chance of a random failure in any one of the seven PR builds, then the chance of having at least one of the PR builds having a failure jumps to 1 - (1 - 0.2)^7 = 0.79 or 80%! And if any job fails, you have to rerun all of the jobs and the probability of failure the next time is still 80% and so on. The result is that it can take many PR testing iterations to get all of the jobs to pass.

This occurs relatively frequently, for example, in the Trilinos PR testing system (which currently uses a custom PR testing system which also lacks the ability to rerun individual jobs and where each job has a non-trivial random probability of failure).

What this means is that if you can't rerun single jobs that fail, then you just can't effectively scale to a large number of testing jobs. As another example, if you have 100 GitHub Actions jobs with just 1% chance of experiencing a failure (which is about the frequency of failure of just being able to fetch dependencies in a GitHub Actions job), then the cumulative probability of failure across these 100 jobs is 1 - (1-0.01)^100 = 0.63 or 63%! But if you can rerun individual jobs, the number of GitHub Actions jobs needed to pass goes way down and getting a set of passing jobs becomes much more probable after the first GitHub Actions jobs run that has a 63% cumulative probability of failure. If there is just a single job that failed in the first running of all of the GHA jobs (due to a random failure), then the rerunning of that one job would have just a 1% chance of failing or a 99% of passing. That reduces wasted computing resources and speeds up the testing cycle wall-clock time.

This is a big deal for projects that need many testing jobs and have a higher probability of failure in any individual job.

piotrekkr · 2022-03-16T09:39:30Z

Seems like it is live now and we can rerun single jobs. I'm really grateful for devs for implementing this 🙏 🎉

And now some tiny rant 😅

It's kinda broken when using job matrix and Cypress parallel tests...

Here is how it worked and why it is not working well with failed job rerun feature

on "setup" job we generated unique ID for cypress tests run
next we used job matrix to generate three workers that were running Cypress tests in parallel using generated ID
when some jobs in matrix fail we rerun full workflow which generated new ID and run whole matrix again

Why it does not work with rerunning only one matrix job? Because unique ID is the same and Cypress consider this run as finished and do not run tests again. I did not find a way to force running them again with same ID. What can be done with this is:

rerun whole workflow again (old way)
rerun failed jobs only (will create new workflow run attempt only with failed matrix jobs and with same unique ID)
rerun manually all failed jobs one by one (no way to manually rerun whole matrix again)
rerun "setup" job will trigger new ID and also will trigger all dependent jobs

First approach is slow since all need to be rerun again. Second approach seems best at first because we could use UNIQUE_ID-RUN_ATTEMPT as Cypress ID, but it can be problematic when only 1 of 10 matrix jobs failed and one runner will need to handle all e2e tests again (no parallelization). Third approach is not good either since we cannot select multiple jobs to rerun manually at same attempt so no parallelization. Last approach is what we use now and it works ok but developers need to remember to rerun this setup job instead of just rerunning failed jobs.

So to sum up

Maybe we could add some flag to mark whole matrix as failed when one of jobs inside matrix fails? When we rerun all failed jobs it would rerun whole matrix again. What do you think?

Thanks

tvdeyen · 2022-03-16T10:44:21Z

Wow. Finally. Two years of wasting precious resources later it finally shipped. Thanks for everyone involved.

Jolg42 · 2022-03-16T12:49:54Z

Can confirm it's here! 🎊

Looks like Santa was early this year 🎅🏼

willyt150 · 2022-03-21T15:26:58Z

I saw the release announcement for supporting re-running single jobs, is this being released in phases or something? The GitHub Enterprise repos I'm working on still do not have any ability to re-run individual jobs.

I thought maybe it just wouldn't work with old runs, so I kicked off new ones and still nothing, just the re-run all jobs option.

chrispat · 2022-03-21T15:45:01Z

It is currently available on github.com only and is slated to ship in the next update to GitHub enterprise. In addition there are still some issues related to reusable workflows that we are ironing out.

davegallant · 2022-03-21T20:27:22Z

It is currently available on github.com only and is slated to ship in the next update to GitHub enterprise. In addition there are still some issues related to reusable workflows that we are ironing out.

This is amazing work. Not seeing the option to re-run failed jobs for reusable workflows. Wasn't sure if it's because the call to the reusable we're using is dependent upon another job or not.

EDIT: For more context: the first job is reading configuration and then passing the config to the reusable workflow call that starts several jobs in a matrix.

chrispat · 2022-03-22T00:40:48Z

We have temporarily disabled the feature for any run that references a reusable workflow while we iron out the issues. We hope to have those resolved towards the end of this week or early next week.

debugger24 · 2022-03-26T14:27:09Z

Unable to rerun single job when some jobs are pending review deployment.

Here, I want to rerun build_3 before approving build_4.

piotrekkr · 2022-03-27T20:15:21Z

@debugger24 This is my guess only but this is probably by design. Rerunning any job creates whole new run attempt for whole workflow. All jobs that are not dependent on job you want to rerurn, are "cloned" into new run attempt. But to clone you need a job result first so you need to wait for all jobs to finish.

Drowze · 2022-03-28T13:24:30Z

Found another unexpected behaviour:

given a job that submits a manual status check (e.g. via API) that has passed
and given a different job that has failed
when I retry only failed jobs, the resulting check group will not have the manual check (submitted by the job that has passed on the first try)

This is a problem to us: we have a manual check called "Rubocop" (submitted manually using reviewdog) that is required for a pull request to be merged. If we retry the workflow, we have all jobs passing, but the manual check is missing, so a PR can't be merged.

Screenshots of such case (1st with failed jobs, then 2nd re-ran, but without the manual rubocop status check)

mrmike · 2022-04-05T09:07:38Z

We have temporarily disabled the feature for any run that references a reusable workflow while we iron out the issues. We hope to have those resolved towards the end of this week or early next week.

Do you have any public issue opened for this case? I'd like to track progress of this issue

hugovk · 2022-04-08T05:20:23Z

This is now working, thanks!

madhavajay · 2022-05-05T23:53:00Z

Is it possible to re-run a failed job before the others finish? We have quite long running jobs which means the wait to retry a failed test due to some weird external issue is a really long time.

janpio · 2023-11-01T15:41:31Z

It is not yet @madhavajay, so I created a feedback discussion to suggest that: https://github.com/orgs/community/discussions/73156 Leave an upvote or reaction over there! (also the 43 other people that upvote the previous comment optimally 😆)

abhilash1in · 2024-04-17T00:04:04Z

I don't see "Re-run failed jobs" option as a dropdown.

I also don't see an option to re-run individual jobs when I hover over them.

Is this a bug or am I doing something wrong?

piotrekkr · 2024-04-17T07:16:44Z

I don't see "Re-run failed jobs" option as a dropdown.

I also don't see an option to re-run individual jobs when I hover over them.

Is this a bug or am I doing something wrong?

@abhilash1in Are you sure that all jobs inside workflow are finished? If they are not done yet there will be no option to rerun. GitHub requires for full workflow to finish before it can be rerun.

abhilash1in · 2024-04-17T23:09:09Z

I don't see "Re-run failed jobs" option as a dropdown.
I also don't see an option to re-run individual jobs when I hover over them.
Is this a bug or am I doing something wrong?

@abhilash1in Are you sure that all jobs inside workflow are finished? If they are not done yet there will be no option to rerun. GitHub requires for full workflow to finish before it can be rerun.

Erm, okay all jobs had not finished running when I was looking for the re-run failed jobs button.

But also, that doesn't make sense. If I see failed jobs, I should be able to re-run them individually without having to wait for all the jobs to finish.

piotrekkr · 2024-04-18T07:27:46Z

I don't see "Re-run failed jobs" option as a dropdown.
I also don't see an option to re-run individual jobs when I hover over them.
Is this a bug or am I doing something wrong?

@abhilash1in Are you sure that all jobs inside workflow are finished? If they are not done yet there will be no option to rerun. GitHub requires for full workflow to finish before it can be rerun.

Erm, okay all jobs had not finished running when I was looking for the re-run failed jobs button.

But also, that doesn't make sense. If I see failed jobs, I should be able to re-run them individually without having to wait for all the jobs to finish.

Yeah would be nice to be able to do this. However, I think that GitHub needs to store state of whole workflow run before you can rerun parts of it again. Some jobs are depending on results of other jobs (even if those jobs failed). They probably wait for all to be executed (skipped, failed, cancelled or successful), store workflow run state somewhere, and then they are able to know what jobs state to "copy" from previous run, and what to rerun again.

Omzig · 2024-04-18T18:19:07Z

in azure, i have to wait for the jobs to finish before i can rerun them.............

tvdeyen added the enhancement New feature or request label Apr 17, 2020

TingluoHuang added service Service Feature Feature scope to the pipelines service and launch app and removed enhancement New feature or request service labels Jun 8, 2020

ygj6 mentioned this issue Oct 28, 2020

Build: Migrate CI to GitHub Actions jquery/jquery#4800

Merged

4 tasks

dentarg mentioned this issue Jan 3, 2021

Add #string method to Puma::NullIO puma/puma#2520

Merged

8 tasks

lbruun mentioned this issue Jan 25, 2021

Migrate build from Travis CI to GitHub Actions. apache/netbeans#2708

Closed

harshanarayana mentioned this issue Mar 28, 2021

GIT-2023: Enable GitHub Actions support sanic-org/sanic#2050

Merged

jklenzing mentioned this issue May 11, 2021

TST: GitHub actions pysat/pysatMadrigal#45

Merged

This was referenced Jun 1, 2021

CI: Move from Travis to GitHub Actions iputils/iputils#336

Closed

Migrate from Travis CI to GitHub Actions linux-test-project/ltp#761

Closed

rkm mentioned this issue Mar 1, 2022

Switch CI to GitHub Actions SMI/SmiServices#1074

Merged

11 tasks

ethomson closed this as completed Mar 1, 2022

bartlettroscoe mentioned this issue Mar 16, 2022

Trilinos auto PR tester stability issues trilinos/Trilinos#3276

Closed

andreaskaris mentioned this issue Mar 17, 2022

Retest failed command improvements and /retest-failed action ovn-org/ovn-kubernetes#2867

Merged

radarhere mentioned this issue Apr 1, 2022

Enable re-running failed jobs python-pillow/pillow-wheels#277

Closed

AlekseyMartynov mentioned this issue Aug 16, 2022

Remove 'last_run_status' trick DevExpress/DevExtreme#22369

Merged

Felixoid mentioned this issue Dec 18, 2023

GitHub Actions Dammaz Kron Felixoid/actions-experiments#9

Open

16 tasks

Add ability to re-run single jobs #432

Add ability to re-run single jobs #432

Comments

tvdeyen commented Apr 17, 2020 • edited Loading

TingluoHuang commented Apr 17, 2020

ygj6 commented Dec 2, 2020

fuesec commented Jan 7, 2021

schw4rzlicht commented Mar 3, 2021

rathboma commented Mar 18, 2021 • edited Loading

Ferroin commented Mar 19, 2021

sindrijo commented Mar 19, 2021

prettycoder commented Mar 25, 2021

tbarbugli commented Mar 26, 2021

OmgImAlexis commented Mar 27, 2021

maragunde93 commented Apr 13, 2021 • edited Loading

johanhelsing-attensi commented May 7, 2021 • edited Loading

btjones-me commented May 25, 2021

gergo-papp commented May 25, 2021

Sarga commented May 27, 2021

rr-nick-tan commented May 27, 2021

marcelwa commented Jun 1, 2021

domdfcoding commented Jun 1, 2021

Omzig commented Feb 1, 2022

dylanbhughes commented Feb 28, 2022

ethomson commented Mar 1, 2022

bartlettroscoe commented Mar 9, 2022 • edited Loading

piotrekkr commented Mar 16, 2022 • edited Loading

And now some tiny rant 😅

Here is how it worked and why it is not working well with failed job rerun feature

So to sum up

tvdeyen commented Mar 16, 2022

Jolg42 commented Mar 16, 2022

willyt150 commented Mar 21, 2022

chrispat commented Mar 21, 2022

davegallant commented Mar 21, 2022 • edited Loading

chrispat commented Mar 22, 2022

debugger24 commented Mar 26, 2022 • edited Loading

piotrekkr commented Mar 27, 2022 • edited Loading

Drowze commented Mar 28, 2022 • edited Loading

mrmike commented Apr 5, 2022

hugovk commented Apr 8, 2022

madhavajay commented May 5, 2022

janpio commented Nov 1, 2023

abhilash1in commented Apr 17, 2024

piotrekkr commented Apr 17, 2024

abhilash1in commented Apr 17, 2024

piotrekkr commented Apr 18, 2024 • edited Loading

Omzig commented Apr 18, 2024

tvdeyen commented Apr 17, 2020 •

edited

Loading

rathboma commented Mar 18, 2021 •

edited

Loading

maragunde93 commented Apr 13, 2021 •

edited

Loading

johanhelsing-attensi commented May 7, 2021 •

edited

Loading

bartlettroscoe commented Mar 9, 2022 •

edited

Loading

piotrekkr commented Mar 16, 2022 •

edited

Loading

davegallant commented Mar 21, 2022 •

edited

Loading

debugger24 commented Mar 26, 2022 •

edited

Loading

piotrekkr commented Mar 27, 2022 •

edited

Loading

Drowze commented Mar 28, 2022 •

edited

Loading

piotrekkr commented Apr 18, 2024 •

edited

Loading