Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Feature: Multi Cluster TargetGroupBindings #3853

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

zac-nixon
Copy link

@zac-nixon zac-nixon commented Sep 16, 2024

Issue

#2173

Description

Initial approach done here, but didn't handle all use cases such as instance target groups or clusters that share subnets. This new approach will work with IP and Instance targets and makes no assumptions on the clusters' composition, subnets, etc.

This PR adds a new flag to the TGB CR that allows users to specify that a TGB TG ARN might be associated to more than one cluster. This flag can be set from ingress / svc annotations as well for managed TGBs.

The basis of the solution is to maintain a per TGB config map that tracks what targets have been registered in the ELB API. When performing deregistrations, we first check the TGB config map and filter out any targets that haven't been registered in the TGB config map. We maintain an in-memory cache of this config map in order to reduce reads to the k8s API. Every reconcile cycle that changes the endpoints will also incur a write of this config map.

I've placed some warnings in the documentation about potential pitfalls of this solution, such as moving an existing TGB to multicluster support could potentially leak target deregistrations while the config map state is generated. The other warning is around flipping between multicluster / non-multicluster for the same reason.

I've load tested this feature and on a relatively big target group of 150 targets there is roughly 25 additional ms of latency during reconciles due to the write of the CM. This 25ms latency represented about a 5% latency increase in my cluster.

Tests done

  • IP targets in multiple clusters
  • Instance targets in multiple clusters
  • NLB w/ shared target groups
  • ALB w/ shared target groups.
  • Load test with moderately size TGB (150 replicas)

Checklist

  • Added tests that cover your change (if possible)
  • Added/modified documentation as required (such as the README.md, or the docs directory)
  • Manually tested
  • Made sure the title of the PR is a good description that can go into the release notes

BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯

  • Backfilled missing tests for code in same general area 🎉
  • Refactored something and made the world a better place 🌟

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: zac-nixon
Once this PR has been reviewed and has the lgtm label, please assign m00nf1sh for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 16, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @zac-nixon. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 16, 2024
@shraddhabang
Copy link
Collaborator

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 23, 2024
| [alb.ingress.kubernetes.io/conditions.${conditions-name}](#conditions) | json |N/A| Ingress | N/A |
| [alb.ingress.kubernetes.io/target-node-labels](#target-node-labels) | stringMap |N/A| Ingress,Service | N/A |
| [alb.ingress.kubernetes.io/mutual-authentication](#mutual-authentication) | json |N/A| Ingress | Exclusive |
| [alb.ingress.kubernetes.io/multi-cluster-target-group](#multi-cluster-target-group) | boolean |N/A| Ingress, Service | Merge |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The merge behavior does not apply here as each ingress in a group creates a its own target group and not necessarily wold want to merge this setting across the ingresses sharing the group.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you recommend for this setting?

docs/guide/targetgroupbinding/targetgroupbinding.md Outdated Show resolved Hide resolved
pkg/backend/endpoint_types.go Show resolved Hide resolved
pkg/ingress/model_build_target_group_test.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants