Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated cherry pick of #5739: Store NetworkPolicy in filesystem as fallback data source #5777: Enable Pod network after realizing initial NetworkPolicies #5795: Support Local ExternalTrafficPolicy for Services with #5798: Fix unit test TestReconcile #5833: Enable IPv4/IPv6 forwarding on demand automatically #5861

Conversation

tnqn
Copy link
Member

@tnqn tnqn commented Jan 10, 2024

Cherry pick of #5739 #5777 #5795 #5798 #5833 on release-1.14.

#5739: Store NetworkPolicy in filesystem as fallback data source
#5777: Enable Pod network after realizing initial NetworkPolicies
#5795: Support Local ExternalTrafficPolicy for Services with
#5798: Fix unit test TestReconcile
#5833: Enable IPv4/IPv6 forwarding on demand automatically

For details on the cherry pick process, see the cherry pick requests page.

@tnqn tnqn added the kind/cherry-pick Categorizes issue or PR as related to the cherry-pick of a bug fix from the main branch to a release label Jan 10, 2024
In the previous implementation, traffic from/to a Pod may bypass
NetworkPolicies applied to the Pod in a time window when the agent
restarts because realizing NetworkPolicies and enabling forwarding are
asynchronous.

This patch stores NetworkPolicy data in files when they are received,
and makes antre-agent fallback to use the files as data source if it
can't connect to antrea-controller on startup. This prevents security
regression: a NetworkPolicy that has been realized on a Node will
continue to work even if antrea-controller is not available after
antrea-agent restarts.

The benchmark results of the storage's operations are as below:

BenchmarkFileStoreAddNetworkPolicy-40              70383             16102 ns/op             520 B/op          9 allocs/op
BenchmarkFileStoreAddAppliedToGroup-40             45382             25880 ns/op            3019 B/op          9 allocs/op
BenchmarkFileStoreAddAddressGroup-40                7400            180000 ns/op           49538 B/op          9 allocs/op
BenchmarkFileStoreReplaceAll-40                       10         127088004 ns/op        17815943 B/op      33099 allocs/op

The disk usage when storing 1k NetworkPolicies, AddressGroups, and
AppliedToGroups created by BenchmarkFileStoreReplaceAll is as below:

16M     /var/run/antrea-test/file-store/address-groups
4.0M    /var/run/antrea-test/file-store/applied-to-groups
4.0M    /var/run/antrea-test/file-store/network-policies

Signed-off-by: Quan Tian <qtian@vmware.com>
Pod network should only be enabled after realizing initial
NetworkPolicies, otherwise traffic from/to Pods could bypass
NetworkPolicy when antrea-agent restarts.

After commit f9fc979 ("Store NetworkPolicy in filesystem as
fallback data source"), antrea-agent can realize either the latest
NetworkPolicies got from antrea-controller or the ones got from
filesystem as fallback. Therefore, waiting for NetworkPolicies to be
realized should not add marked delay or make antrea-controller a failure
point of Pod network.

This commit adds an implementation of wait group capable of waiting with
a timeout, and uses it to wait for common initialization and
NetworkPolicy realization before installing any flows for Pods. More
preconditions can be added via the wait group if needed in the future.

Signed-off-by: Quan Tian <qtian@vmware.com>
Since K8s 1.29, setting Local ExternalTrafficPolicy for ClusterIP
Services with ExternalIPs is supported.

Signed-off-by: Quan Tian <qtian@vmware.com>
cniServer.reconcile() now installs flows asynchorously.

Signed-off-by: Quan Tian <qtian@vmware.com>
Although it has been documented as a prerequisite in [1], there are
some platforms not enabling ip forwarding by default. kube-proxy ipvs
mode and some CNIs enable it by themselves to ensure Pod networking
work properly.

As Antrea needs IP forwarding to be enabled, there seems no reason to
not do it by itself, rather than expecting users or other components to
do it.

[1] https://kubernetes.io/docs/setup/production-environment/container-runtimes/#install-and-configure-prerequisites

Signed-off-by: Quan Tian <qtian@vmware.com>
@tnqn tnqn force-pushed the automated-cherry-pick-of-#5739-#5777-#5795-#5798-#5833-upstream-release-1.14 branch from a1d180a to a71c40d Compare January 10, 2024 13:45
@tnqn
Copy link
Member Author

tnqn commented Jan 11, 2024

/test-all

@tnqn tnqn merged commit 9985032 into antrea-io:release-1.14 Jan 12, 2024
43 of 51 checks passed
@tnqn tnqn deleted the automated-cherry-pick-of-#5739-#5777-#5795-#5798-#5833-upstream-release-1.14 branch January 12, 2024 03:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/cherry-pick Categorizes issue or PR as related to the cherry-pick of a bug fix from the main branch to a release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants