Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows Node fails to run Pod using the provided OVS binary #4702

Closed
wenyingd opened this issue Mar 15, 2023 · 0 comments · Fixed by #4705
Closed

Windows Node fails to run Pod using the provided OVS binary #4702

wenyingd opened this issue Mar 15, 2023 · 0 comments · Fixed by #4705
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@wenyingd
Copy link
Contributor

wenyingd commented Mar 15, 2023

Describe the bug

When running Pods on Windows Node, I saw the Pods is always stucking in "RunContainerError" status. After describing Pod details, these events are got,

Events:
  Type     Reason  Age                  From     Message
  ----     ------  ----                 ----     -------
  Warning  Failed  22m (x12 over 100m)  kubelet  Error: context deadline exceeded
  Warning  Failed  8m (x23 over 106m)   kubelet  Error: context deadline exceeded
  Warning  Failed  2m (x16 over 108m)   kubelet  Error: context deadline exceeded

After checking antrea-agent logs, it looks antrea-agent has successfully allocated IP address to the Pod. HNSEndpoint, OVS port and OpenFlow entries are installed on the Node. But containerd runtime has no response to the call from kubelet.

After deleting the Pod with command "kubectl delete Pod xxx", Pod is stucking in "Terminating" status. Re-check HNS resources, it was found that the HNSEndpoint was not deleted. After manually deleting the HNS Endpoint, the powershell is stuck with no response. The setup is using the default OVS binary provided via this link (https://github.com/antrea-io/antrea/blob/main/hack/windows/Install-OVS.ps1#L35)

This is an OVS known issue which may block HNS services, and the fix is merged in the open source OVS (openvswitch/ovs#385). Hence, antrea provided OVS is supposed to include the fix.

To Reproduce

Expected

Actual behavior

Versions:

Additional context

@wenyingd wenyingd added the kind/bug Categorizes issue or PR as related to a bug. label Mar 15, 2023
XinShuYang added a commit to XinShuYang/antrea that referenced this issue Mar 15, 2023
Upgrade windows ovs to 2.16.7 on both script and CI testbeds.

Fixes antrea-io#4702

Signed-off-by: Shuyang Xin <gavinx@vmware.com>
@tnqn tnqn closed this as completed in #4705 May 8, 2023
tnqn pushed a commit that referenced this issue May 8, 2023
* Update windows ovs download link to ovs 2.16.7

Upgrade windows ovs to 2.16.7 on both script and CI testbeds.

Fixes #4702

Signed-off-by: Shuyang Xin <gavinx@vmware.com>

* Update the topology of nodeportlocal e2e test

The client and server pods should be on different nodes to be closed
to real scenarios and reduce unbalanced resources.

Signed-off-by: Shuyang Xin <gavinx@vmware.com>
ceclinux pushed a commit to ceclinux/antrea that referenced this issue Jun 5, 2023
* Update windows ovs download link to ovs 2.16.7

Upgrade windows ovs to 2.16.7 on both script and CI testbeds.

Fixes antrea-io#4702

Signed-off-by: Shuyang Xin <gavinx@vmware.com>

* Update the topology of nodeportlocal e2e test

The client and server pods should be on different nodes to be closed
to real scenarios and reduce unbalanced resources.

Signed-off-by: Shuyang Xin <gavinx@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants