Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: infer managed resources health from redis instead of storing it in CRD #10191

Merged
merged 2 commits into from
Aug 17, 2022

Conversation

alexmt
Copy link
Collaborator

@alexmt alexmt commented Aug 4, 2022

Signed-off-by: Alexander Matyushentsev AMatyushentsev@gmail.com

Argo CD historically stores managed resources' health details in Application CRD.

At some point, Argo CD started collecting health status for the whole resource hierarchy so now health status is duplicated in CRD and decided to store new health status in Redis to reduce application CRD side and reduce the number of updates.

This PR is the continuation of the original effort and introduces the following changes:

  • controller no longer stores health status in app CRD
  • API server infers health status from Redis and populate health status fields while serving application API requests

The change slightly reduces Application CRD but, most importantly, significantly reduces the number of application CRD updates. E.g. before, an application that manages Deployment with 10 pods would need to be updated 10+ times during rollout. The only change would be the change in health messages such as Waiting for rollout to finish: 7 of 10 updated replicas are available.... With changes, the application CRD would be updated just once.

For backward compatibility controller, by default, keep storing health status in CRD. The new behavior can be enabled using ARGOCD_PERSIST_RESOURCE_STATUS_HEALTH=true env variable.

@codecov
Copy link

codecov bot commented Aug 4, 2022

Codecov Report

Attention: Patch coverage is 74.46809% with 12 lines in your changes missing coverage. Please review.

Project coverage is 45.94%. Comparing base (42863a4) to head (46cdd7d).
Report is 2619 commits behind head on master.

Files Patch % Lines
server/application/application.go 45.45% 12 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10191      +/-   ##
==========================================
+ Coverage   45.92%   45.94%   +0.01%     
==========================================
  Files         229      229              
  Lines       28258    28268      +10     
==========================================
+ Hits        12978    12987       +9     
  Misses      13510    13510              
- Partials     1770     1771       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@crenshaw-dev crenshaw-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should docs be added explaining the benefit of using the option? Is there any plan to deprecate the CRD-storage mechanism?

Also I'm not sure if this is really testable... seems like maybe at least inferResourcesStatusHealth should have a test.

common/common.go Outdated Show resolved Hide resolved
controller/health.go Outdated Show resolved Hide resolved
@@ -1897,6 +1905,22 @@ func (s *Server) GetApplicationSyncWindows(ctx context.Context, q *application.A
return res, nil
}

func (s *Server) inferResourcesStatusHealth(app *v1alpha1.Application) {
if !argocommon.PersistResourceStatusHealth {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this boolean be part of the Application cache key?

The upside would be that, after someone toggles on the option, there's no moment when health statuses are not returned.

The downside would be that, after someone toggles on the option, the entire Application cache is now out of date.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got, hopefully, a good idea: introduced resourceHealthSource field in application status that indicates if resources health is in CRD (by default) or in application tree.

This eliminates the requirement to coordinate new flag value between application controller & API server. Another benefit is that health status will be gradually removed from CRDs by controller.

Copy link
Member

@jannfis jannfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not remove the health status from the CRD, as many folks building solutions around Argo CD may use this information to feed their own systems instead of going through the API (which imho, is a perfectly fine use case).

So from my point of view, this is a heavily breaking change.

Just putting my Request for Change on here to prevent early merge and to facilitate further discussion on this :)

@crenshaw-dev
Copy link
Member

I think Alex intends this feature to be opt-in rather than opt-out. If we find it super valuable, we could consider enabling it by default in 3.0. But as the PR is written now, I don't think it's a breaking change.

@jannfis
Copy link
Member

jannfis commented Aug 10, 2022

Yeah, I was a little quick with making my assumption. Sorry - I'm tired and stupid :)

You're right. It's even mentioned it in the PR's description.

@crenshaw-dev
Copy link
Member

I'm tired and stupid

Nah, just tired. And I can relate. 😴

@alexmt alexmt force-pushed the managed-resources-health branch 3 times, most recently from 2e08d42 to 1b1fdac Compare August 10, 2022 21:55
@alexmt
Copy link
Collaborator Author

alexmt commented Aug 10, 2022

Thank you for review @jannfis and @crenshaw-dev ! PTAL

@crenshaw-dev
Copy link
Member

@alexmt conflict time. :-)

Copy link
Member

@crenshaw-dev crenshaw-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nitpick!

Should there be a note somewhere in documentation about when/why this might be helpful (and that we might make it the default in 3.0, if that's the plan)?

@alexmt
Copy link
Collaborator Author

alexmt commented Aug 12, 2022

Should there be a note somewhere in documentation about when/why this might be helpful (and that we might make it the default in 3.0, if that's the plan)?

I could not find and dedicated file that would be appropriate. So documented the setting in argocd-cmd-params-cm.yaml

Copy link
Member

@crenshaw-dev crenshaw-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@crenshaw-dev
Copy link
Member

@jannfis PTAL

@alexmt alexmt enabled auto-merge (squash) August 16, 2022 23:10
Copy link
Member

@jannfis jannfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

…t in CRD

Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com>
Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com>
@alexmt alexmt merged commit 893a867 into argoproj:master Aug 17, 2022
ciiay pushed a commit to ciiay/argo-cd that referenced this pull request Aug 18, 2022
…t in CRD (argoproj#10191)

* chore: infer managed resources health from redis instead of storing it in CRD

Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com>

* apply reviewer notes

Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com>

Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com>
Signed-off-by: Yi Cai <yicai@redhat.com>
crenshaw-dev added a commit that referenced this pull request Aug 19, 2022
* fix: missing actions (#10327) (#10359)

Signed-off-by: CI <michael@crenshaw.dev>

Signed-off-by: CI <michael@crenshaw.dev>
Signed-off-by: Yi Cai <yicai@redhat.com>

* chore: infer managed resources health from redis instead of storing it in CRD (#10191)

* chore: infer managed resources health from redis instead of storing it in CRD

Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com>

* apply reviewer notes

Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com>

Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com>
Signed-off-by: Yi Cai <yicai@redhat.com>

* fix: Add logic to handle for fileHandle.Close() (#9963) (#10361)

Signed-off-by: xin.li <xin.li@daocloud.io>

Signed-off-by: xin.li <xin.li@daocloud.io>
Signed-off-by: Yi Cai <yicai@redhat.com>

* docs: fix typo in upgrade notes (#10377)

Signed-off-by: Xijun Dai <daixijun1990@gmail.com>

Signed-off-by: Xijun Dai <daixijun1990@gmail.com>
Signed-off-by: Yi Cai <yicai@redhat.com>

* fix: add space before prompt in CLI (#10362)

Signed-off-by: xin.li <xin.li@daocloud.io>

Signed-off-by: xin.li <xin.li@daocloud.io>
Signed-off-by: Yi Cai <yicai@redhat.com>

* docs: fix indentation of example AppProject in 'Sync Windows' documentation (#10388)

Signed-off-by: Yi Cai <yicai@redhat.com>

* fix: Correctly assume cluster-scoped resources to be self-referenced (#10390)

Signed-off-by: jannfis <jann@mistrust.net>

Signed-off-by: jannfis <jann@mistrust.net>
Signed-off-by: Yi Cai <yicai@redhat.com>

* ui-make-https-repo-credential-editable

Signed-off-by: ciiay <yicai@redhat.com>
Signed-off-by: Yi Cai <yicai@redhat.com>

* Minor format fix

Signed-off-by: ciiay <yicai@redhat.com>
Signed-off-by: Yi Cai <yicai@redhat.com>

* Minor fix for unclickable input field

Signed-off-by: ciiay <yicai@redhat.com>
Signed-off-by: Yi Cai <yicai@redhat.com>

* Updates for comments

Signed-off-by: ciiay <yicai@redhat.com>
Signed-off-by: Yi Cai <yicai@redhat.com>

* ui-make-https-repo-credential-editable

Signed-off-by: ciiay <yicai@redhat.com>
Signed-off-by: Yi Cai <yicai@redhat.com>

* Minor format fix

Signed-off-by: ciiay <yicai@redhat.com>
Signed-off-by: Yi Cai <yicai@redhat.com>

* Minor fix for unclickable input field

Signed-off-by: ciiay <yicai@redhat.com>
Signed-off-by: Yi Cai <yicai@redhat.com>

* Updates for comments

Signed-off-by: ciiay <yicai@redhat.com>
Signed-off-by: Yi Cai <yicai@redhat.com>

Signed-off-by: CI <michael@crenshaw.dev>
Signed-off-by: Yi Cai <yicai@redhat.com>
Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com>
Signed-off-by: xin.li <xin.li@daocloud.io>
Signed-off-by: Xijun Dai <daixijun1990@gmail.com>
Signed-off-by: jannfis <jann@mistrust.net>
Signed-off-by: ciiay <yicai@redhat.com>
Co-authored-by: Michael Crenshaw <michael@crenshaw.dev>
Co-authored-by: Alexander Matyushentsev <AMatyushentsev@gmail.com>
Co-authored-by: my-git9 <xin.li@daocloud.io>
Co-authored-by: Xijun Dai <daixijun1990@gmail.com>
Co-authored-by: Jun Duan <jun.duan@ibm.com>
Co-authored-by: jannfis <jann@mistrust.net>
@sbose78
Copy link
Contributor

sbose78 commented Sep 22, 2022

I think we should not remove the health status from the CRD, as many folks building solutions around Argo CD may use this information to feed their own systems

+1 on this.

@dlemfh
Copy link

dlemfh commented Nov 9, 2022

Hey @alexmt (and community)!

May I ask, will setting controller.resource.health.persist: 'false' change the outcome of argocd app get <application_name> commands?

I want to use the benefits of the flag, but my ci/cd pipeline depends on resource health outputs of argocd app get <application_name>.

I've confirmed that using the false setting does indeed remove resource health statuses from the Application CRD, but I've also checked (from eye-reading) that the output of argocd app get <application_name> stays the same and still provides resource health status info (on argocd v2.5.2).

Can I trust that moving forward (even in v3.0), argocd cli output will stay the same? (Or at least that change in cli behavior will be properly documented/announced?)

Thanks always! 🤩

@alexmt
Copy link
Collaborator Author

alexmt commented Nov 9, 2022

Hello @dlemfh ,

CLI/API behavior won't change after setting controller.resource.health.persist: 'false'. API server retrieves health information from Redis.

You can assume that v2.x won't have breaking API/CLI changes. v3.0 might introduce it but we will make sure to notify about such change ahead of time and describe migration path. Currently there are no plans to introduce v3.0

@dlemfh
Copy link

dlemfh commented Nov 10, 2022

@alexmt Awesome, thanks!! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants