Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the controller for API BGPPolicy #6203

Merged
merged 1 commit into from
Jul 25, 2024

Conversation

hongliangl
Copy link
Contributor

@hongliangl hongliangl commented Apr 9, 2024

This commit implements the controller of API BGPPolicy, designed to advertise
Service IPs, Egress IPs, and Pod IPs to BGP peers from selected Kubernetes Nodes.

According to the spec of BGPPolicy, the Node selector is used to select Nodes
to which a BGPPolicy is applied. Multiple BGPPolicies can be applied to the
same Node. However, only the oldest BGPPolicy will be effective on a Node,
with others serving as alternatives. The effective one may be changed in the
following cases:

  • The current effective BGPPolicy is updated and not applied to the Node.
  • The current effective BGPPolicy is deleted.

The BGP server instance is only created and started for the effective BGPPolicy on
a Node. If the effective BGPPolicy is changed, the corresponding BGP server instance
will be terminated by calling the Stop method, and a new BGP server instance will
be created and started by calling the Start method for the new effective BGPPolicy.

To create a BGP server instance, ASN, router ID, and listen port must be specified.
The ASN and listen port are specified in the spec of the effective BGPPolicy. For router ID,
if the Kubernetes cluster is IPv4-only or dual-stack, we use the Node's IPv4 address
as the router ID, ensuring uniqueness. If the Kubernetes cluster is IPv6-only, where no
Node IPv4 address is available, the router ID could be specified via the Node annotation
node.antrea.io/bgp-router-id. If not present, a router ID will be generated by hashing
the Node name and update it to the Node annotation node.antrea.io/bgp-router-id.
Additionally, the stale BGP server instance will be terminated and a new BGP server
instance should be created and started when any of ASN, routerID, or listen port changes.

The information of the BGP peers is specified in the effective BGPPolicy. The unique
identification of a BGP peer is the peer IP address and peer ASN.

To reconcile the latest BGP peers:

  • Get the BGP peers to be added and add them by calling the AddPeer method of the
    BGP server instance.
  • Get the BGP peers to be deleted and delete them by calling the RemovePeer method
    of the BGP server instance.
  • Get the remaining BGP peers and calculate the updated BGP peers, then update them by
    calling the UpdatePeer method of the BGP server instance.

The information of the IPs to be advertised can be calculated from the spec of the
effective BGPPolicy. Currently, we advertise the IPs and CIDRs to all the BGP peers.

To reconcile the latest IPs to all BGP peers:

  • If the BGP server instance is newly created and started, advertise all the IPs by
    calling the AdvertiseRoutes method.
  • If the BGP server instance is not newly created and started:
    • Get the IPs/CIDRs to be added and advertise them by calling the AdvertiseRoutes method.
    • Get the IPs/CIDRs to be removed and withdraw them by calling the WithdrawRoutes method.

The feature is gated by the alpha BGPPolicy feature gate and only supported in Linux.

@hongliangl hongliangl added kind/feature Categorizes issue or PR as related to a new feature. action/release-note Indicates a PR that should be included in release notes. labels Apr 9, 2024
@rajnkamr rajnkamr added the area/transit/bgp Issues or PRs related to BGP support. label Apr 11, 2024
@hongliangl hongliangl added this to the Antrea v2.1 release milestone May 24, 2024
@hongliangl hongliangl force-pushed the 20240222-bgp-controller-v2 branch 2 times, most recently from f209e2a to 3063737 Compare May 30, 2024 10:40
@hongliangl hongliangl force-pushed the 20240222-bgp-controller-v2 branch 3 times, most recently from 57773d9 to c9b727a Compare June 6, 2024 12:25
@hongliangl hongliangl force-pushed the 20240222-bgp-controller-v2 branch 11 times, most recently from 095d7da to 5086b34 Compare June 19, 2024 07:23
@hongliangl hongliangl marked this pull request as ready for review June 19, 2024 07:26
@hongliangl hongliangl force-pushed the 20240222-bgp-controller-v2 branch 2 times, most recently from 3bab09b to 69c31a2 Compare June 20, 2024 05:55
@hongliangl hongliangl changed the title Add BGPPolicy controller Implement the controller for API BGPPolicy Jun 20, 2024
@hongliangl hongliangl requested a review from tnqn July 23, 2024 03:17
pkg/agent/controller/bgp/controller.go Outdated Show resolved Hide resolved
pkg/agent/controller/bgp/controller.go Outdated Show resolved Hide resolved
pkg/agent/controller/bgp/controller.go Outdated Show resolved Hide resolved
pkg/agent/controller/bgp/controller.go Outdated Show resolved Hide resolved
pkg/agent/controller/bgp/controller.go Outdated Show resolved Hide resolved
pkg/agent/controller/bgp/controller.go Show resolved Hide resolved
return "", fmt.Errorf("failed to patch BGP router ID to Node annotation %s: %w", types.NodeBGPRouterIDAnnotationKey, err)
}
} else if !utilnet.IsIPv4String(routerID) {
return "", fmt.Errorf("BGP router ID should be an IPv4 address string")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also correct the routeID with the annotation NodeBGPRouterIDAnnotationKey in this case? Considering it's owned by Antrea but there might be a case that users modified it accidentally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we are obligated to correct that since that's what the users should do by themselves. cc @tnqn

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically we shouldn't update user input which would cause unexpected results. If users ever set this annotation, they want to specify the router ID to a value which is more meaningful for them, instead of using the system generated one. If they set a wrong value and we override it to a correct but different value, they wouldn't know they input a invalid ID and would think this is not configurable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, right now there is no good signal that the value is invalid and cannot be used. I was wondering if we should update the annotation value to something like "<invalid value provided, please use a valid IPv4 address>". But I don't feel very strongly about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine as long as there is error log about it given this should only be used by advanced users who are familiar with BGP. And this is not the only case, the same situation applies to NPL, ServiceExternalIP, Service, Egress which takes configuration via user-provided annotation. Antrea users seem careful enough that we have never received bugs related to misconfigured annotations on other features :)

pkg/agent/controller/bgp/controller.go Show resolved Hide resolved
Comment on lines +621 to +575
if c.enabledIPv4 && utilnet.IsIPv4String(ip) {
allRoutes.Insert(bgp.Route{Prefix: ip + ipv4Suffix})
}
if c.enabledIPv6 && utilnet.IsIPv6String(ip) {
allRoutes.Insert(bgp.Route{Prefix: ip + ipv6Suffix})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe wrap this as 'UpdateAllRoutes' and reused in the following function.

if node.GetName() != c.nodeName {
return
}
if reflect.DeepEqual(node.GetLabels(), oldNode.GetLabels()) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we check labels and annotations only? it reminds me that the case for Flexible IPAM, Do we plan to support this? I suppose it's a valid case for BGP?
@tnqn @antoninbas

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be wrong but my understanding is Flexible IPAM is another way to address the problem BGP tries to resolve, by dividing the Node network into multiple flat VLAN subnets and routing between them via a router, so BGP is not required here.

It needs to check labels and annotations because they affect whether the policy applies to the node and the node router ID.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think about it too much but it's an interesting question. I agree with Quan's observation for the current version of Antrea Flexible IPAM.
However, if we interpret "Flexible IPAM" as the ability to assign arbitrary IPs to Pods, without looking at the current implementation (which requires bridging mode), then maybe this is something that BGP can actually help achieve (when supported by the underlay) without requiring that the Node's transport interface be moved to the OVS bridge? What do you think?

Note: that would be a L3 approach to Flexible IPAM, not a L2 approach using VLANs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-opened this conversation not because it impacts this PR, but because I think it's a good thing to discuss

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we should decouple IPAM and implementation. I agree BGPPolicy can help achieve Flexible IPAM without briding mode. Perhaps we can track it via an issue. If my understanding is correct, it may require the BGP processes running on Nodes exchange routes, not only advertise routes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my understanding is correct, it may require the BGP processes running on Nodes exchange routes, not only advertise routes?

I would say probably, yes. And that's what the Calico BGP implementation does I think.
I don't know if it is strictly required. If we keep the current approach (Node IPAM + Flexible IPAM, with the flexibility to choose between the 2 for different workloads), then I can see how just disabling SNAT could be enough, assuming that for Pods using Flexible IPs the default route is used and traffic gets to a router to which Flexible IPs were advertised. But for more generic support, and for the ability to send traffic to the destination Node directly (e.g. when 2 Nodes are on the same subnet) I would say being able to learn routes over BGP is necessary.

@hongliangl
Copy link
Contributor Author

/test-all

@hongliangl hongliangl force-pushed the 20240222-bgp-controller-v2 branch 2 times, most recently from 840d1ed to fbb40aa Compare July 23, 2024 13:38
pkg/agent/controller/bgp/controller.go Outdated Show resolved Hide resolved
pkg/agent/controller/bgp/controller.go Outdated Show resolved Hide resolved
queue workqueue.RateLimitingInterface
}

func NewBGPPolicyController(ctx context.Context,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit strange to pass the context in the New function. Would it make more sense to set this field in the Run method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Additionally, I have deleted ctx context.Context from struct Controller, and the ctx will be passed through function parameter to methods of goBGP interface, not a member of Controller.

pkg/agent/controller/bgp/controller.go Outdated Show resolved Hide resolved
Comment on lines 456 to 427
bgpServer := c.bgpPolicyState.bgpServer
for key := range peerToAddKeys {
peerConfig := curPeerConfigs[key]
if err := bgpServer.AddPeer(c.ctx, peerConfig); err != nil {
return err
}
c.bgpPolicyState.peerConfigs[key] = peerConfig
}
for key := range peerToUpdateKeys {
peerConfig := curPeerConfigs[key]
if err := bgpServer.UpdatePeer(c.ctx, peerConfig); err != nil {
return err
}
c.bgpPolicyState.peerConfigs[key] = peerConfig
}
for key := range peerToDeleteKeys {
peerConfig := prePeerConfigs[key]
if err := bgpServer.RemovePeer(c.ctx, peerConfig); err != nil {
return err
}
delete(c.bgpPolicyState.peerConfigs, key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using c.ctx directly for the add / update / remove peer operations, I think it may make sense to create a child context with a reasonable timeout using https://pkg.go.dev/context#WithTimeout

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm agreed that it will be better to have a reasonable timeout for calling add/update/remove peer operations, but I'm not sure about how to decide timeout.

pkg/agent/types/bgppolicy.go Outdated Show resolved Hide resolved
pkg/agent/controller/bgp/controller.go Outdated Show resolved Hide resolved
}
}

func (c *Controller) Run(ctx context.Context, stopCh <-chan struct{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's an anti-pattern to pass both a context and a stop channel to the same function. They also "mean" the same thing as the context is generated from the stop channel in this case, and will be cancelled when the channel is closed.

I would recommend only keeping the context in this case, and you can call ctx.Done() when you need a channel.
You can also do the opposite (keep the channel) and use wait.ContextForChannel to get a context. But I would say keeping the context is probably a more "modern" / idiomatic approach.

Copy link
Contributor Author

@hongliangl hongliangl Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the detailed explanation!

I would recommend only keeping the context in this case, and you can call ctx.Done() when you need a channel.

I prefer this. It looks neater and more "modern".

pkg/agent/controller/bgp/controller.go Outdated Show resolved Hide resolved
pkg/agent/controller/bgp/controller.go Outdated Show resolved Hide resolved
slices.Equal(oldSvc.Spec.ExternalIPs, svc.Spec.ExternalIPs) &&
slices.Equal(getIngressIPs(oldSvc), getIngressIPs(svc)) &&
oldSvc.Spec.ExternalTrafficPolicy == svc.Spec.ExternalTrafficPolicy &&
reflect.DeepEqual(oldSvc.Spec.InternalTrafficPolicy, svc.Spec.InternalTrafficPolicy) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It feels like a bit of a waste to use reflection to compare 2 *type values, when type is a primitive type. You could use https://pkg.go.dev/k8s.io/utils/ptr#Equal

I am not sure if this comment applies to other locations as well

Copy link
Contributor Author

@hongliangl hongliangl Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here ptr.Equal(oldSvc.Spec.InternalTrafficPolicy, svc.Spec.InternalTrafficPolicy) looks much better.

For other locations using reflect, there are pointers within the struct to compare, so I didn't change them.

if node.GetName() != c.nodeName {
return
}
if reflect.DeepEqual(node.GetLabels(), oldNode.GetLabels()) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think about it too much but it's an interesting question. I agree with Quan's observation for the current version of Antrea Flexible IPAM.
However, if we interpret "Flexible IPAM" as the ability to assign arbitrary IPs to Pods, without looking at the current implementation (which requires bridging mode), then maybe this is something that BGP can actually help achieve (when supported by the underlay) without requiring that the Node's transport interface be moved to the OVS bridge? What do you think?

Note: that would be a L3 approach to Flexible IPAM, not a L2 approach using VLANs

pkg/agent/controller/bgp/controller.go Show resolved Hide resolved
pkg/agent/controller/bgp/controller.go Outdated Show resolved Hide resolved
pkg/agent/controller/bgp/controller_test.go Outdated Show resolved Hide resolved
@hongliangl hongliangl force-pushed the 20240222-bgp-controller-v2 branch 3 times, most recently from 6ba5320 to 9692014 Compare July 25, 2024 04:56
@luolanzone
Copy link
Contributor

@hongliangl Thanks a lot for the quick response, @antoninbas @tnqn could you take another look? We plan to release 2.1 tomorrow, need to merge this as early as we can to move forward. Thanks!
btw, can we defer the e2e PR #6523 if there is no enough time to close it?

@hongliangl hongliangl force-pushed the 20240222-bgp-controller-v2 branch 3 times, most recently from c4c6fcb to d9cf1ee Compare July 25, 2024 13:40
go wait.UntilWithContext(ctx, c.worker, time.Second)

go c.secretInformer.Run(ctx.Done())
cache.WaitForCacheSync(ctx.Done(), c.secretInformer.HasSynced)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be waited before starting the worker, along with other informers? otherwise the worker would try to connect peer routers without providing any passwords first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. Moved these two lines before go wait.UntilWithContext(ctx, c.worker, time.Second)

Comment on lines 238 to 254

cacheSyncs := []cache.InformerSynced{
c.nodeListerSynced,
c.serviceListerSynced,
c.bgpPolicyListerSynced,
c.endpointSliceListerSynced,
c.serviceListerSynced,
}
if c.egressEnabled {
cacheSyncs = append(cacheSyncs, c.egressListerSynced)
}
if !cache.WaitForNamedCacheSync(controllerName, ctx.Done(), cacheSyncs...) {
return
}

go c.secretInformer.Run(ctx.Done())
cache.WaitForCacheSync(ctx.Done(), c.secretInformer.HasSynced)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cacheSyncs := []cache.InformerSynced{
c.nodeListerSynced,
c.serviceListerSynced,
c.bgpPolicyListerSynced,
c.endpointSliceListerSynced,
c.serviceListerSynced,
}
if c.egressEnabled {
cacheSyncs = append(cacheSyncs, c.egressListerSynced)
}
if !cache.WaitForNamedCacheSync(controllerName, ctx.Done(), cacheSyncs...) {
return
}
go c.secretInformer.Run(ctx.Done())
cache.WaitForCacheSync(ctx.Done(), c.secretInformer.HasSynced)
go c.secretInformer.Run(ctx.Done())
cacheSyncs := []cache.InformerSynced{
c.nodeListerSynced,
c.serviceListerSynced,
c.bgpPolicyListerSynced,
c.endpointSliceListerSynced,
c.serviceListerSynced,
c.secretInformer.HasSynced,
}
if c.egressEnabled {
cacheSyncs = append(cacheSyncs, c.egressListerSynced)
}
if !cache.WaitForNamedCacheSync(controllerName, ctx.Done(), cacheSyncs...) {
return
}

This commit implements the controller of API `BGPPolicy`, designed to advertise
Service IPs, Egress IPs, and Pod IPs to BGP peers from selected Kubernetes Nodes.

According to the spec of `BGPPolicy`, the Node selector is used to select Nodes
to which a `BGPPolicy` is applied. Multiple `BGPPolicies` can be applied to the
same Node. However, only the oldest `BGPPolicy` will be effective on a Node,
with others serving as alternatives. The effective one may be changed in the
following cases:

- The current effective BGPPolicy is updated and not applied to the Node.
- The current effective BGPPolicy is deleted.

The BGP server instance is only created and started for the effective BGPPolicy on
a Node. If the effective BGPPolicy is changed, the corresponding BGP server instance
will be terminated by calling the `Stop` method, and a new BGP server instance will
be created and started by calling the `Start` method for the new effective BGPPolicy.

To create a BGP server instance, ASN, router ID, and listen port must be specified.
The ASN and listen port are specified in the spec of the effective BGPPolicy. For router ID,
if the Kubernetes cluster is IPv4-only or dual-stack, we use the Node's IPv4 address
as the router ID, ensuring uniqueness. If the Kubernetes cluster is IPv6-only, where no
Node IPv4 address is available, the router ID could be specified via the Node annotation
`node.antrea.io/bgp-router-id`. If not present, a router ID will be generated by hashing
the Node name and update it to the Node annotation `node.antrea.io/bgp-router-id`.
Additionally, the stale BGP server instance will be terminated and a new BGP server
instance should be created and started when any of ASN, routerID, or listen port changes.

The information of the BGP peers is specified in the effective BGPPolicy. The unique
identification of a BGP peer is the peer IP address and peer ASN.

To reconcile the latest BGP peers:

- Get the BGP peers to be added and add them by calling the `AddPeer` method of the
  BGP server instance.
- Get the BGP peers to be deleted and delete them by calling the `RemovePeer` method
  of the BGP server instance.
- Get the remaining BGP peers and calculate the updated BGP peers, then update them by
  calling the `UpdatePeer` method of the BGP server instance.

The information of the IPs to be advertised can be calculated from the spec of the
effective BGPPolicy. Currently, we advertise the IPs and CIDRs to all the BGP peers.

To reconcile the latest IPs to all BGP peers:

- If the BGP server instance is newly created and started, advertise all the IPs by
  calling the `AdvertiseRoutes` method.
- If the BGP server instance is not newly created and started:
  - Get the IPs/CIDRs to be added and advertise them by calling the `AdvertiseRoutes` method.
  - Get the IPs/CIDRs to be removed and withdraw them by calling the `WithdrawRoutes` method.

The feature is gated by the alpha `BGPPolicy` feature gate and only supported in Linux.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the great contribution!

return ""
}

func waitAndGetDummyEvent(t *testing.T, c *fakeController) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for a future PR, it may be good to rename this to waitEvent and doneDummyEvent could be renamed to removeEvent. They could also be defined as methods on fakeController.

@antoninbas
Copy link
Contributor

/test-all

@antoninbas antoninbas merged commit 905a8a6 into antrea-io:main Jul 25, 2024
55 of 58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes. area/transit/bgp Issues or PRs related to BGP support. kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants