You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Node(s) CPU architecture, OS, and Version:
AMD x86_64, AWS EC2 m5.2xlarge, 5.10.167-147.601.amzn2.x86_64
Cluster Configuration:
Single node, cluster created using k3sup
Describe the bug:
TCP Connections between pods in the cluster take a long time to establish, however once established become fast. Example being a database connection taking several retries to connect successfully (the database running in the cluster as well) but once its up the queries happen quickly.
I have done some debugging and by tcpdump'ing cni0 I can see that almost all UDP and TCP packets that are coming into the interface have incorrect checksum errors. Not sure if this is a symptom or the root cause, looking online at flannel it seems like there have been issues in the past when offloading checksum calculations to the NIC, I tried turning off tx-checksum-ip-generic with ethtool as suggested in them posts but got nowhere.
Steps To Reproduce:
Provisioned EC2 instance as described above
Execute k3sup to install k3s
Installed K3s with the following flags:
--disable traefik, (set of OIDC flags for kube apiserver), --secrets-encryption
Expected behavior:
TCP connections establish quickly between pods in the cluster.
Actual behavior:
TCP connections take a long time to establish requiring several retries.
Additional context / logs:
As mentioned I have looked through a lot of information about flannel already to try and debug this but cant workout why I am seeing what I am seeing.
Here is a screen capture of tcpdump output:
I can do some hacky grepping and see that some checksums are correct:
I have no idea if this is the cause of the issue or a symptom of some other misconfig.
I have also tried host-gw backend and see the same.
Thanks!
The text was updated successfully, but these errors were encountered:
I have tried the newer operating system but got the same issue.
Just need someone to confirm that these checksum errors are expected on the cni0 interface, from what I have gathered from further reading they are expected from veth devices as it makes no sense to use a checksum when nothing is going over the wire.
If this is the case I will move on to try some other fixes.
Environmental Info:
K3s Version:
1.26.4
Node(s) CPU architecture, OS, and Version:
AMD x86_64, AWS EC2 m5.2xlarge, 5.10.167-147.601.amzn2.x86_64
Cluster Configuration:
Single node, cluster created using k3sup
Describe the bug:
TCP Connections between pods in the cluster take a long time to establish, however once established become fast. Example being a database connection taking several retries to connect successfully (the database running in the cluster as well) but once its up the queries happen quickly.
I have done some debugging and by tcpdump'ing cni0 I can see that almost all UDP and TCP packets that are coming into the interface have incorrect checksum errors. Not sure if this is a symptom or the root cause, looking online at flannel it seems like there have been issues in the past when offloading checksum calculations to the NIC, I tried turning off tx-checksum-ip-generic with ethtool as suggested in them posts but got nowhere.
Steps To Reproduce:
Expected behavior:
TCP connections establish quickly between pods in the cluster.
Actual behavior:
TCP connections take a long time to establish requiring several retries.
Additional context / logs:
As mentioned I have looked through a lot of information about flannel already to try and debug this but cant workout why I am seeing what I am seeing.
Here is a screen capture of tcpdump output:
I can do some hacky grepping and see that some checksums are correct:
I have no idea if this is the cause of the issue or a symptom of some other misconfig.
I have also tried host-gw backend and see the same.
Thanks!
The text was updated successfully, but these errors were encountered: