Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self Healing: Allow AWS Controller to Detect and Fix AWS Resource Changes on Interval #2800

Closed
lucas-howard-macmillan opened this issue Sep 14, 2022 · 12 comments
Assignees
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. triage/needs-investigation

Comments

@lucas-howard-macmillan
Copy link

lucas-howard-macmillan commented Sep 14, 2022

Is your feature request related to a problem?

If a load balancer is deleted through the AWS Console, the AWS load balancer does not notice or re-create the load balancer.

The AWS load balancer controller must be restarted, and then the missing load balancer is recreated.

Describe the solution you'd like

An argument that could be passed into the controller indicating that it should do a full scan of AWS on a certain interval in an attempt to detect and fix drift within AWS from the expected state.

This would basically emulate the behavior that the AWS load balancer controller does when it starts up.

Potentially, for large deployments, you might also need a segment size argument as well
i.e.

Every 5 minutes scan AWS for 100 ingresses, then the next 5 minutes the next 100 ingresses etc..

Describe alternatives you've considered

I have used all existing arguments, such as sync period, but none of them cause the load balancer to be re-created.

@kishorj
Copy link
Collaborator

kishorj commented Sep 14, 2022

/assign @M00nF1sh
Investigate further on the periodic sync issue
This is similar to #2515

@lukonjun
Copy link

lukonjun commented Oct 4, 2022

Experienced the same behaviour. I assumed when I delete the LB via the AWS Console the ALB Controller would automatically recreate, however it did not.

@dongho-jung
Copy link

dongho-jung commented Oct 25, 2022

I'm experiencing the same issue. It only gets recovered when the number of replicas behind the service is changed.

I expected it would get recovered every 200s according to below

const (
defaultWaitTGBObservedPollInterval = 200 * time.Millisecond
defaultWaitTGBObservedTimeout = 60 * time.Second
defaultWaitTGBDeletionPollInterval = 200 * time.Millisecond
defaultWaitTGBDeletionTimeout = 60 * time.Second
)

@ChrisV78
Copy link

ChrisV78 commented Nov 18, 2022

Experienced the same by accident, removed the wrong ALB from the AWS console and the lb-controller only recreates the ALB the moment you restart the lb-controller deployment.

@mxkmp
Copy link

mxkmp commented Nov 23, 2022

Yes, that's unfortunately the only fix which is possible at the moment. I think in an older version (before the renaming) it was possible to get it recreated automatically. Can this please be fixed?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 21, 2023
@lucas-howard-macmillan
Copy link
Author

lucas-howard-macmillan commented Feb 21, 2023 via email

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 21, 2023
@lucas-howard-macmillan
Copy link
Author

Yes, that's unfortunately the only fix which is possible at the moment. I think in an older version (before the renaming) it was possible to get it recreated automatically. Can this please be fixed?

In previous versions, it did automatically modify / recreate when it detected that AWS resources are not correct.

While running a previous version, we had an issue where multiple load balancers were accidentally deleted, and by the time we were notified there was issue, the controller had already re-created them.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 13, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 13, 2023
@oliviassss
Copy link
Collaborator

Hi, we have shipped the fix in v2.5.4, please check the details in our release note: https://github.com/kubernetes-sigs/aws-load-balancer-controller/releases/tag/v2.5.4.
I'm closing this ticket as of now, please feel free to reach out or reopen if you have any issues. Thanks

@lucas-howard-macmillan
Copy link
Author

@oliviassss Thank You!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. triage/needs-investigation
Projects
None yet
Development

No branches or pull requests

10 participants