Skip to content

Commit

Permalink
fix: add backoff to operator retry mechanism (#650)
Browse files Browse the repository at this point in the history
## Description
Adds delay to operator retries to fix issues with temporary failures
causing `status=Failed` for Packages

Open to suggestions/thoughts on the delay time but this is the current
breakdown:
| Retry Attempt (c) | backOffSeconds (3^c) |
|-------------------|----------------------|
|         1         |           3          |
|         2         |           9          |
|         3         |          27          |
|         4         |          81          |

It is possible it would make more sense to retry more frequently with
less delay (maybe it being exponential doesn't make sense here).

Example of what this looks like from the logs:
```bash
(⎈|k3d-uds:default) uds-core (backoff) kubectl logs -l pepr.dev/controller=watcher -n pepr-system --tail -1 | grep "seconds before" | jq '.msg'
"Waiting 3 seconds before processing Package grafana/grafana, status.phase: Retrying, observedGeneration: 1, retryAttempt: 1"
"Waiting 3 seconds before processing Package neuvector/neuvector, status.phase: Retrying, observedGeneration: 1, retryAttempt: 1"
"Waiting 9 seconds before processing Package grafana/grafana, status.phase: Retrying, observedGeneration: 1, retryAttempt: 2"
"Waiting 9 seconds before processing Package neuvector/neuvector, status.phase: Retrying, observedGeneration: 1, retryAttempt: 2"
"Waiting 27 seconds before processing Package grafana/grafana, status.phase: Retrying, observedGeneration: 1, retryAttempt: 3"
"Waiting 27 seconds before processing Package neuvector/neuvector, status.phase: Retrying, observedGeneration: 1, retryAttempt: 3"
```

## Related Issue
Fixes #649

## Type of change

- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Other (security config, docs update, etc)

## Checklist before merging

- [ ] Test, docs, adr added or updated as needed
- [ ] [Contributor
Guide](https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md)
followed
  • Loading branch information
rjferguson21 committed Aug 8, 2024
1 parent 6eee8f0 commit 52c97fd
Showing 1 changed file with 18 additions and 1 deletion.
19 changes: 18 additions & 1 deletion src/pepr/operator/reconcilers/package-reconciler.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { handleFailure, shouldSkip, updateStatus } from ".";
import { handleFailure, shouldSkip, updateStatus, writeEvent } from ".";
import { UDSConfig } from "../../config";
import { Component, setupLogger } from "../../logger";
import { enableInjection } from "../controllers/istio/injection";
Expand Down Expand Up @@ -35,6 +35,23 @@ export async function packageReconciler(pkg: UDSPackage) {
return;
}

if (pkg.status?.retryAttempt && pkg.status?.retryAttempt > 0) {
// calculate exponential backoff where backoffSeconds = 3^retryAttempt
const backOffSeconds = 3 ** pkg.status?.retryAttempt;

log.info(
metadata,
`Waiting ${backOffSeconds} seconds before processing package ${namespace}/${name}, status.phase: ${pkg.status?.phase}, observedGeneration: ${pkg.status?.observedGeneration}, retryAttempt: ${pkg.status?.retryAttempt}`,
);

await writeEvent(pkg, {
message: `Waiting ${backOffSeconds} seconds before retrying package`,
});

// wait for backOff seconds before retrying
await new Promise(resolve => setTimeout(resolve, backOffSeconds * 1000));
}

// Migrate the package to the latest version
migrate(pkg);

Expand Down

0 comments on commit 52c97fd

Please sign in to comment.