Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ClusterInfo type ResourceExport recreation bug #4412

Merged
merged 1 commit into from
Dec 5, 2022

Conversation

luolanzone
Copy link
Contributor

@luolanzone luolanzone commented Nov 24, 2022

After Gateway HA is enabled, the ClusterInfo type of ResourceExport will be recreated when the active Gateway is changed. But there is a case that a new ClusterInfo of ResourceExport creation may fail when the leader controller process is slow and existing ResourceExport is not deleted in time.
The root cause is the Gateway creation event will update a ClusterInfo of ResourceExport instead of creating a new one due to it still can get existing one from cache when the ResourceExport's DeletionTimestamp is not zero, after ResourceExport is updated, it will be recycled by kubernetes very quickly since it's stale object. From Gateway controller's perspective, Gateway change is reflecting in ResourceExport, but it's actually being deleted eventually.
Fix the issue through retry when the existing ResourceExport's DeletionTimestamp is not zero.

Signed-off-by: Lan Luo luola@vmware.com

@luolanzone luolanzone added area/multi-cluster Issues or PRs related to multi cluster. action/backport Indicates a PR that requires backports. labels Nov 24, 2022
@luolanzone
Copy link
Contributor Author

/test-multicluster-e2e

@codecov
Copy link

codecov bot commented Nov 24, 2022

Codecov Report

Merging #4412 (407ecda) into main (b977b1d) will decrease coverage by 0.02%.
The diff coverage is 40.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4412      +/-   ##
==========================================
- Coverage   65.56%   65.54%   -0.03%     
==========================================
  Files         400      412      +12     
  Lines       56847    56941      +94     
==========================================
+ Hits        37270    37320      +50     
- Misses      16882    16941      +59     
+ Partials     2695     2680      -15     
Flag Coverage Δ
e2e-tests 61.57% <40.00%> (?)
integration-tests 34.54% <0.00%> (+0.04%) ⬆️
kind-e2e-tests 47.45% <ø> (-0.67%) ⬇️
unit-tests 49.81% <40.00%> (+<0.01%) ⬆️
Impacted Files Coverage Δ
...ter/controllers/multicluster/gateway_controller.go 84.09% <40.00%> (+14.23%) ⬆️
pkg/agent/cniserver/ipam/antrea_ipam.go 52.81% <0.00%> (-22.95%) ⬇️
pkg/agent/controller/networkpolicy/reject.go 63.05% <0.00%> (-12.81%) ⬇️
pkg/agent/openflow/multicast.go 19.88% <0.00%> (-8.25%) ⬇️
pkg/agent/types/networkpolicy.go 89.74% <0.00%> (-4.86%) ⬇️
pkg/agent/cniserver/server.go 74.94% <0.00%> (-4.05%) ⬇️
pkg/controller/networkpolicy/store/addressgroup.go 88.37% <0.00%> (-3.49%) ⬇️
...agent/flowexporter/connections/deny_connections.go 87.09% <0.00%> (-3.23%) ⬇️
pkg/agent/openflow/service.go 88.54% <0.00%> (-3.13%) ⬇️
pkg/agent/openflow/pipeline.go 82.11% <0.00%> (-2.76%) ⬇️
... and 54 more

@@ -113,6 +113,11 @@ func (r *GatewayReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ct
}
return nil
}

if !existingResExport.DeletionTimestamp.IsZero() {
return fmt.Errorf("existing ResourceExport is being deleted, retry later")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question - if we do not add this check, will updateResourceExport() fail? Then what is the difference of returning an error here, compared to letting updateResourceExport() fail and return an error at line 122?

Have you checked how long the new active GW can be finally recreated, when the race condition happens and we have to retry here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updateResourceExport() will not fail because update is allowed when DeletionTimestamp is not empty. The updated one will be recycled by K8s because DeletionTimestamp is not zero after that.
The new active Gateway is created immediately when old Gateway is removed or down. The corresponding ClusterInfo type of ResourceExport will also be created almost immediately because it's usually one time retry unless the leader controller is down.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, could there be another race condition that the check is passed here, but DeletionTimeStamp is set before or after update is done?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we only have one goroutine in gateway controller to handle Gateway change considering Gateway event rarely happens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you get the ResourceExport from client cache right? The state can be stale?

When the Gateway changes, should we create the ResourceExport rather than update it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling client cache won't resolve it as the object in apiserver may be updated after the API is called.
This situation is very common and a typical way to resolve it is, using the object in the cache without concerning if it's stale or not, updating the object with resourceVersion got from the cache (clientset automatically does it as long as the object has resourceVersion), then if the object is stale, apiserver will reject the update request with updateConflict error, and client should retrieve the latest version from apiserver directly and retry. Example: https://github.com/antrea-io/antrea/blob/main/pkg/agent/controller/egress/egress_controller.go#L561

But I wonder even if it updates the resource export successfully, could the other slow leader delete it because it thinks this is the one for itself? If yes, the deletion logic should check whether the object matches and set resourceVersion when sending the delete request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tnqn thanks for the suggestion. I am not sure I got your last question about other slow leader, there is only one leader controller in one ClusterSet, and each member cluster will create its own ClusterInfo type of ResourceExport and also 1-1 mapping ResourceImport. Only when member cluster submits the deletion request, the leader controller will delete corresponding ClusterInfo of ResourceExport. But I think we can set resourceVersion for delete request considering it should have no side-effect.

Copy link
Contributor Author

@luolanzone luolanzone Dec 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianjuns I have refined the code to create ResourceExport when DeletionTimestamp != 0. After checking existing codes, it already has resourceVersion when it's trying to update existing ResourceExport, so it should fail when there is conflict and retry. To avoid stale cache, I removed the spec comparison steps and let it update anyway. For the deletion action, I feel it's Ok without resourceVersion considering Gateway webhook will guarantee that only one Gateway is allowed in a member cluster. all events should happen by sequence.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still keep the resourceVersion in update or not? If yes, maybe add a comment before the update call about that.

@tnqn : the controller has only a single worker, so no synchronization issue or reordering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianjuns yes, the updateResourceExport will reuse existing ResourceExport's resourceVersion to update. I have added a comment for this.

@luolanzone luolanzone force-pushed the fix-clusterinfo-recreate-bug branch 3 times, most recently from 950ad3e to 9ffb265 Compare December 2, 2022 07:19
jianjuns
jianjuns previously approved these changes Dec 2, 2022
After Gateway HA is enabled, the ClusterInfo type of ResourceExport
will be recreated when the active Gateway is changed. But there is a case
that a new ClusterInfo of ResourceExport creation may fail when the
leader controller process is slow and existing ResourceExport is not
deleted in time.
Fix the issue through retry when the existing ResourceExport's DeletionTimestamp
is not zero.

Signed-off-by: Lan Luo <luola@vmware.com>
@luolanzone
Copy link
Contributor Author

/test-multicluster-e2e

@tnqn tnqn added this to the Antrea v1.10 release milestone Dec 5, 2022
@jianjuns
Copy link
Contributor

jianjuns commented Dec 5, 2022

/skip-all

@jianjuns jianjuns merged commit 330a7de into antrea-io:main Dec 5, 2022
@luolanzone luolanzone deleted the fix-clusterinfo-recreate-bug branch December 8, 2022 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/backport Indicates a PR that requires backports. area/multi-cluster Issues or PRs related to multi cluster.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants