Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Secret deleted by the garbage collector with delay #1599

Open
aso-adeo opened this issue Sep 11, 2024 · 2 comments · May be fixed by #1605
Open

Secret deleted by the garbage collector with delay #1599

aso-adeo opened this issue Sep 11, 2024 · 2 comments · May be fixed by #1605
Labels
triage Issues/PRs that need to be reviewed

Comments

@aso-adeo
Copy link

Which component:
sealed-secrets-controller:0.27.1

Describe the bug
argoCD is replacing the sealedSecret.
It is deleting the sealedSecret to recreate it in a few milliseconds.
The sealedSecret controller can't unseal the sealedSecret because it is already existing.
With less than 5 seconds of delay, the garbage-collector is seeing the secret with an obsolete SealedSecret ownerReference UID, and deletes it.
Since the sealedSecret controller has given up the unseal of the SealedSecrets after 5 attempts, we don't have the secret anymore.

To Reproduce
It's not easily reproducible because it didn't happen on every clusters we have made this scenario.

Expected behavior
We're expecting the sealedSecret controller to try to unseal the secret with an exponential backoff, instead of doing all its attempts in a few milliseconds.
It's not rare than the garbage collector has some delays in its actions.

Version of Kubernetes:
1.28 & 1.29

  • Output of kubectl version:
Server Version: v1.29.8-gke.1031000
@aso-adeo aso-adeo added the triage Issues/PRs that need to be reviewed label Sep 11, 2024
@alemorcuq
Copy link
Collaborator

alemorcuq commented Sep 29, 2024

The retries are already done with an exponential backoff, but since the retry limit is just 5 they happen too fast:

$ kubectl logs -n kube-system deploy/sealed-secrets-controller | grep Updating | nl -v 0
     0	time=2024-09-29T11:49:35.764Z level=INFO msg=Updating key=default/my-secret
     1	time=2024-09-29T11:49:35.777Z level=INFO msg=Updating key=default/my-secret
     2	time=2024-09-29T11:49:35.794Z level=INFO msg=Updating key=default/my-secret
     3	time=2024-09-29T11:49:35.822Z level=INFO msg=Updating key=default/my-secret
     4	time=2024-09-29T11:49:35.867Z level=INFO msg=Updating key=default/my-secret
     5	time=2024-09-29T11:49:35.955Z level=INFO msg=Updating key=default/my-secret
     6	time=2024-09-29T11:49:36.123Z level=INFO msg=Updating key=default/my-secret
     7	time=2024-09-29T11:49:36.451Z level=INFO msg=Updating key=default/my-secret
     8	time=2024-09-29T11:49:37.098Z level=INFO msg=Updating key=default/my-secret
     9	time=2024-09-29T11:49:38.389Z level=INFO msg=Updating key=default/my-secret
    10	time=2024-09-29T11:49:40.957Z level=INFO msg=Updating key=default/my-secret
    11	time=2024-09-29T11:49:46.088Z level=INFO msg=Updating key=default/my-secret
    12	time=2024-09-29T11:49:56.338Z level=INFO msg=Updating key=default/my-secret
    13	time=2024-09-29T11:50:16.823Z level=INFO msg=Updating key=default/my-secret
    14	time=2024-09-29T11:50:57.793Z level=INFO msg=Updating key=default/my-secret
    15	time=2024-09-29T11:52:19.723Z level=INFO msg=Updating key=default/my-secret
    16	time=2024-09-29T11:55:03.572Z level=INFO msg=Updating key=default/my-secret

As you can see, each retry doubles the previous wait time, but it's not until the 9th retry that it starts waiting more than 1 second, and by the 15th retry it is already waiting more than 1 minute.

A quick solution for this would be to just increase the number of max retries (currently at 5). What do you think?

cc @agarcia-oss

@aso-adeo
Copy link
Author

Yes, I think that we should increase the default number of max retries to 15.

@alemorcuq alemorcuq linked a pull request Oct 2, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Issues/PRs that need to be reviewed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants