Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argocd-server does not try to reconnect to redis on DNS error #6336

Closed
3 tasks done
dpkirchner opened this issue May 26, 2021 · 2 comments · Fixed by #7207
Closed
3 tasks done

argocd-server does not try to reconnect to redis on DNS error #6336

dpkirchner opened this issue May 26, 2021 · 2 comments · Fixed by #7207
Assignees
Labels
bug Something isn't working workaround There's a workaround, might not be great, but exists
Milestone

Comments

@dpkirchner
Copy link

dpkirchner commented May 26, 2021

If you are trying to resolve an environment-specific issue or have a one-off question about the edge case that does not require a feature then please consider asking a question in argocd slack channel.

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

argocd-server does not try to reconnect to redis on DNS error. I don't know if it retries after other errors.

To Reproduce

This is going to be somewhat circuitous.

  • Install the argocd helm chart (I'm using 3.6.2, which installs 2.0.1)
  • Wait until it's all installed and running
  • Log in to argo on the web UI (I'm not exactly sure this is necessary, but it helps verify that everything's up)
  • Save a copy of the argocd-redis svc: kubectl get svc argocd-redis -o yaml > tmp
  • Delete the argocd-redis svc: kubectl delete svc argocd-redis
  • Delete the argocd-server pod: kubectl delete pod -l app.kubernetes.io/name=argocd-server
  • Wait for argocd-server to restart
  • Watch the logs on the argocd-server pod, you'll see an error once per minute: kubectl logs -l app.kubernetes.io/name=argocd-server -f
  • Recreate the argocd-redis svc: kubectl create -f tmp
  • Wait a minute or two, you'll see the same redis errors

Edited to add: This also affects argocd-application-controller.

Expected behavior

I expected argocd-server to try to reconnect to redis after a DNS failure.

Workaround

If you see this error, restart the argocd-server pod after you're sure the argocd-redis svc is created.

Version

argocd@argocd-server-755cf78c88-5d8ln:~$ argocd version
argocd: v2.0.1+d1d9a54
  BuildDate: 2021-05-05T18:03:36Z
  GitCommit: d1d9a542894a158c5f30daf1720318669a996c05
  GitTreeState: clean
  GoVersion: go1.16
  Compiler: gc
  Platform: linux/amd64

Note: This is a local build of argocd (hence the different git commit hash) but it is one temporary commit ahead of the official v2.0.1 build: team-rhino@d1d9a54 (this issue is resolved in a new mainline commit, I just haven't had a chance to update to it yet).

Logs

time="2021-05-26T21:15:19Z" level=warning msg="Failed to resync revoked tokens. retrying again in 1 minute: dial tcp: lookup argocd-redis on 10.179.0.10:53: no such host"
@dpkirchner dpkirchner added the bug Something isn't working label May 26, 2021
@jannfis jannfis added the bug/in-triage This issue needs further triage to be correctly classified label May 27, 2021
@jessesuen
Copy link
Member

Since we pass the hostname to the redis golang client, it implies that the redis golang client does not recover from DNS changes. Sure enough, there is a go-redis issue for this: redis/go-redis#1127

@jessesuen
Copy link
Member

The ugly workaround for this issue is to bounce the client (in this case argocd-server, but it could even be argocd-application-controller).

@jessesuen jessesuen added the workaround There's a workaround, might not be great, but exists label Jun 1, 2021
@alexmt alexmt removed the bug/in-triage This issue needs further triage to be correctly classified label Jun 9, 2021
@alexmt alexmt added this to the v2.2 milestone Jul 29, 2021
@pasha-codefresh pasha-codefresh self-assigned this Sep 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working workaround There's a workaround, might not be great, but exists
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants