-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine Endpoint selection for Multi-cluster Service #4693
Conversation
31f9364
to
cd7e9e3
Compare
pkg/agent/openflow/client.go
Outdated
// UninstallServiceFlows removes flows installed by InstallServiceFlows. | ||
UninstallServiceFlows(svcIP net.IP, svcPort uint16, protocol binding.Protocol) error | ||
|
||
// InstallServiceClusterIPFlows install flows for accessing Endpoints which is Service's ClusterIP, and has an action | ||
// to Service's corresponding group. | ||
InstallServiceClusterIPFlows(svcIP net.IP, svcPort uint16, protocol binding.Protocol, groupID binding.GroupIDType) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought it over again, perhaps it's simpler to just generate one more flow in InstallServiceFlows
directly.
func InstallServiceFlows() {
...
if svcType == v1.ServiceTypeClusterIP && !nested {
flows = append(flows, c.featureService. endpointRedirectFlowForServiceIP(svcIP, svcPort, protocol, groupID))
}
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, I will update it.
daac839
to
7e43f5c
Compare
pkg/agent/proxy/proxier.go
Outdated
// Install ClusterIP flows for the Service. | ||
groupID := p.groupCounter.AllocateIfNotExist(svcPortName, internalPolicyLocal) | ||
if err := p.ofClient.InstallServiceFlows(groupID, svcInfo.ClusterIP(), uint16(svcInfo.Port()), svcInfo.OFProtocol, uint16(affinityTimeout), externalPolicyLocal, corev1.ServiceTypeClusterIP); err != nil { | ||
if err := p.ofClient.InstallServiceFlows(groupID, svcInfo.ClusterIP(), uint16(svcInfo.Port()), svcInfo.OFProtocol, uint16(affinityTimeout), externalPolicyLocal, corev1.ServiceTypeClusterIP, hasNestedService); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the new parameter should be a pointer
- nil, do nothing in
InstallServiceFlows
. The Service is not related to nested Service, like NodePort or LoadBalancer. - true, install extra flow in EndpointDNATTable. The Service could be used as Endpoint of MC Service.
- false, add extra action NestedServiceRegMark to the flow in ServiceLBTable. The Service is a MC Service.
In addition, this pointer should be always nil when MC is not enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this make the case more complicated, we only need to handle true
or false
case. There is actually no difference for nil
and false
since NodePort or LoadBalancer Service also have ClusterIP which can be exported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember that you said at least now only ClusterIP can be used as the Endpoint of MC Service according to Kubernetes KEP-xxxx. If so, I think we should only enhance the flow of pure ClusterIP Service and skip the ClusterIP in NodePort or LoadBalancer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I mean ClusterIP
, not ClusterIP type of Service, since both NodePort and LoadBalancer have ClusterIP, I mean ClusterIP can be exported instead of NodeIP:NodePort
or "LBIP:Port". KEP-1645 also suggested to use ClusterIP of NodePort and LoadBalancer Services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we support that using ClusterIP of NodePort or LoadBalancer as Endpoint of MC-Service currently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, there are integration test failures.
823e868
to
32f91ce
Compare
pkg/agent/proxy/types/types.go
Outdated
// NestedServiceSupport means the Service has an annotation "multicluster.antrea.io/imported-service" | ||
// which means its Endpoints might be another Serivce's ClusterIP. | ||
NestedServiceSupport bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// NestedServiceSupport means the Service has an annotation "multicluster.antrea.io/imported-service" | |
// which means its Endpoints might be another Serivce's ClusterIP. | |
NestedServiceSupport bool | |
// IsNested means the Service's Endpoints could be another Service's ClusterIP. | |
// Currently it's true for Multicluster Service, determined by whether there is a Multicluster specific annotation. | |
IsNested bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if svcType == v1.ServiceTypeClusterIP && !nested { | ||
flows = append(flows, c.featureService.endpointRedirectFlowForServiceIP(svcIP, svcPort, protocol, groupID)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if svcType == v1.ServiceTypeClusterIP && !nested { | |
flows = append(flows, c.featureService.endpointRedirectFlowForServiceIP(svcIP, svcPort, protocol, groupID)) | |
} | |
if svcType == v1.ServiceTypeClusterIP && nested != nil && !*nested { | |
flows = append(flows, c.featureService.endpointRedirectFlowForServiceIP(svcIP, svcPort, protocol, groupID)) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this way, I think we can skip the ClusterIP flow of NodePort or LoadBalancer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in previous comment, I feel there is no need to skip it. They will be be supported via ClusterIP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only gap here is that we didn't officially announce that we support these two kinds of Service in multi-cluster docs, I probably will update it after double check MC controller codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't officially announce and support that , maybe we shouldn't introduce extra flows which are not used totally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned, there are probably just doc updates involved to support NodePort or LoadBalancer with ClusterIP, so I prefer not to skip it, otherwise we may change this part back soon. If you have further concern, we can sync offline, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ClusterIP from NodePort or LoadBalancer will be supported eventually, I'm ok with that.
32f91ce
to
49ede2d
Compare
49ede2d
to
2ebb729
Compare
/test-multicluster-e2e |
2ebb729
to
966807c
Compare
pkg/agent/proxy/types/types.go
Outdated
@@ -22,16 +22,25 @@ import ( | |||
k8sproxy "antrea.io/antrea/third_party/proxy" | |||
) | |||
|
|||
const AntreaMCServiceAnnotation = "multicluster.antrea.io/imported-service" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we define it in pkg/agent/multicluster
? Even consider defining a func there like IsMulticlusterService
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, add a new func.
966807c
to
c8e693b
Compare
/test-multicluster-e2e |
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@hongliangl : any further comments? |
Add a new flow for the Service's ClusterIP in the EndpointDNAT table with group action. When an Endpoint of a Multi-cluster Service is a local Service ClusterIP and being selected, it will go to the corresponding exported Service's group to select the final Endpoint. This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and comes back again. The proposal details can be found in the comment: antrea-io#4508 (comment) Signed-off-by: Lan Luo <luola@vmware.com>
c8e693b
to
b2dc841
Compare
Conflicts resolved. /test-multicluster-e2e |
/test-multicluster-e2e |
1 similar comment
/test-multicluster-e2e |
Seems the testbed is not stable again, the e2es take much longer time which is 2 times than before, and NP tests failed with timeout, I will check with @hjiajing. |
/test-multicluster-e2e |
4 similar comments
/test-multicluster-e2e |
/test-multicluster-e2e |
/test-multicluster-e2e |
/test-multicluster-e2e |
I have tried to trigger MC e2e locally with this branch, the results looks fine, all tests are passed. The latest triggered jenkins job might be back to normal. @tnqn you may either merged this or wait for the latest build result from Jenkins. Thanks.
|
* Revert "Refine Endpoint selection for multi-cluster Service (antrea-io#4508)" This reverts commit 6cdbca3. Signed-off-by: Lan Luo <luola@vmware.com> * Refine Endpoint selection for MC Service Add a new flow for the Service's ClusterIP in the EndpointDNAT table with group action. When an Endpoint of a Multi-cluster Service is a local Service ClusterIP and being selected, it will go to the corresponding exported Service's group to select the final Endpoint. This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and comes back again. The proposal details can be found in the comment: antrea-io#4508 (comment) Signed-off-by: Lan Luo <luola@vmware.com> --------- Signed-off-by: Lan Luo <luola@vmware.com>
In order to decouple Multi-cluster with Antrea Proxy and refine Endpoint selection for Multi-cluster Service, following changes are included in this PR:
group action. When an Endpoint of a Multi-cluster Service is a local Service
ClusterIP and being selected, it will go to the corresponding exported Service's
group to select the final Endpoint. This can avoid that the traffic goes out of the
OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and
comes back again.
The proposal details can be found in the comment:
#4508 (comment)