Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race conditions in NetworkPolicyController #4028

Merged
merged 1 commit into from
Aug 16, 2022
Merged

Commits on Aug 16, 2022

  1. Fix race conditions in NetworkPolicyController

    There were a few race conditions in NetworkPolicyController:
    * An AppliedToGroup or AddressGroup in use may be removed if situations
    like this happens:
    1. addANP creates a group for ANP A;
    2. addNetworkPolicy reuses the group for KNP B, is going to create an
       internal NetworkPolicy;
    3. deleteANP deletes the group for ANP A because at that moment no other
       internal NetworkPolicies are using the group;
    4. addNetworkPolicy commits the internal NetworkPolicy for KNP B to
       storage, but the group no longer exists.
    
    * An Antrea-native NetworkPolicy may be out-of-date if situations like
    this happens:
    1. An ACNP event is received, `updateCNP` calculates the new internal
       NetworkPolicy for the ACNP, is going to commit it to storage;
    2. A ClusterGroup event triggers update of the ACNP via
       triggerCNPUpdates
    3. triggerCNPUpdates calls reprocessCNP which updates the new internal
       NetworkPolicy for the ACNP and commits it to storage;
    4. updateCNP in the first step commits its internal NetworkPolicy to
       storage which overrides the update of the ClusterGroup event.
    
    The second one caused test flake of the test case
    "TestGroupNoK8sNP/Case=ACNPNestedClusterGroup".
    
    To resolve the race conditions completely and make NetworkPolicy
    handling less error prone, this patch refactors NetworkPolicyController:
    * Event handlers no longer update the storage of internal NetworkPolicy
      directly and only triggers resync of affected policies, which ensures
      that there is at most one worker handling an internal NetworkPolicy at
      any moment.
    * Ensure atomicity when updating internal NetworkPolicy and creating or
      deleting AddressGroups and AppliedToGroups.
    
    Duplicate code and tests are deleted with the refactoring.
    
    Signed-off-by: Quan Tian <qtian@vmware.com>
    tnqn committed Aug 16, 2022
    Configuration menu
    Copy the full SHA
    d483bf7 View commit details
    Browse the repository at this point in the history