-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Windows] Improvement in cleanup script #4722
Conversation
sc.exe failure ovs-vswitchd reset= 0 actions= restart/0/restart/0/restart/0 | ||
start-service ovs-vswitchd | ||
$OVS_VERSION=$(Get-Item $OVSInstallDir\driver\OVSExt.sys).VersionInfo.ProductVersion | ||
ovs-vsctl --no-wait set Open_vSwitch . ovs_version=$OVS_VERSION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to set ovs_version here? Will restarting ovs service remove version info in ovsdb?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data is not lost if only restart ovs-vswitchd process/service. But if we re-create the OVS schema file ( called in line 92 when ovs-vswitchd failed ), this data is lost as we have deleted db.conf . This version is needed by in antrea-agent monitoring CR, and it may cause antrea-agent crash if not exists.
@wenyingd @XinShuYang you can use the label |
Codecov Report
@@ Coverage Diff @@
## main #4722 +/- ##
==========================================
- Coverage 71.40% 71.36% -0.05%
==========================================
Files 409 406 -3
Lines 61208 63087 +1879
==========================================
+ Hits 43706 45020 +1314
- Misses 14547 15089 +542
- Partials 2955 2978 +23
*This pull request uses carry forward flags. Click here to find out more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
} | ||
|
||
RemoveNetworkAdapter $OVS_BR_ADAPTER | ||
RemoveNetworkAdapter "antrea-gw0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a commit message to clarify why vNic was removed from Windows VMSwitch and why netadapter such as antrea-gw0 doesn't need to be removed in script any more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the commit message:
s/other than calling ovs-vsctl commands/rather than calling ovs-vsctl commands
stop-service ovs-vswitchd | ||
sc.exe delete ovs-vswitchd | ||
stop-service ovsdb-server | ||
sc.exe delete ovsdb-server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this fail if one of the services has already been stopped or deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would not fail if the service is already stopped. For a nonexist service ( deleted ), yes, it may fail, but it does not block the following commands
return | ||
} | ||
$ovsStatus = $(Get-Service ovs-vswitchd).Status | ||
if ("$ovsStatus" -EQ "StartPending") { | ||
if ("$ovsStatus" -ne "Running") { | ||
sc.exe delete ovs-vswitchd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be safer to add stop-service ovs-vswitchd
before this line? Or is this guaranteed to succeed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is not necessary, as the current of the status is not "running" and stop-service does not valid impact.
return | ||
} | ||
$ovsStatus = $(Get-Service ovs-vswitchd).Status | ||
if ("$ovsStatus" -EQ "StartPending") { | ||
if ("$ovsStatus" -ne "Running") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the status case-insensitive? I ask because I notice that you use "running" instead later in the script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows is case-insensitive.
hack/windows/Clean-AntreaNetwork.ps1
Outdated
if ($vmSwitch -ne $null) { | ||
Write-Host "Remove vNICs" | ||
Remove-VMNetworkAdapter -SwitchName $AntreaHnsNetworkName -ManagementOS -Confirm:$false -ErrorAction SilentlyContinue | ||
$hnsNetwork = Get-HnsNetwork | ? Name -eq $AntreaHnsNetworkName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice that we use a slightly different command in another script:
hack/windows/Prepare-AntreaAgent.ps1:$AntreaHnsNetwork = Get-HnsNetwork | Where-Object {$_.Name -eq "antrea-hnsnetwork"}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two commands are the same, ?
is an alias of Where-Object
.
I would update to keep consistent.
hack/windows/Clean-AntreaNetwork.ps1
Outdated
# This might happen after the Windows host is restarted abnormally, in which case some stale configurations block | ||
# ovs-vswitchd running, like the pid file and the misconfigurations in OVSDB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in which case some stale configurations can prevent ovs-vswitchd from running, like a stale pid file or misconfigurations in OVSDB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
hack/windows/Clean-AntreaNetwork.ps1
Outdated
@@ -6,10 +6,16 @@ | |||
OVS installation directory. It is the path argument when using Install-OVS.ps1. The default path is "C:\openvswitch". | |||
.PARAMETER RenewIPConfig | |||
Renew the ipconfig on the host. The default value is $false. | |||
.PARAMETER RemoveOVS | |||
Remove ovsdb-server and ovs-vswitchd services on the host. The default value is $false. If this argument is set as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/on the host/from the host
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
Typo in summary and commit messages: |
ece97c5
to
faa6b0c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In PR description / commit message:
- s/The existing script would reset ovs-vswitchd is stucking in "starting" status/The existing script would reset ovs-vswitchd when stuck in "starting" status - I think that's what you meant, not totally sure
- s/if they are not existing/if they do not exist
- s/remove the dependency of/remove the dependency on
Otherwise, LGTM
The existing script would reset ovs-vswitchd when stuck in "starting" status. In some corner cases, ovsdb-server/ovs-ovswitchd services may be removed unexpectedly. This change includes, - add improvement to recover ovsdb-server/ovs-vswitched service if they do not exist when running the cleanup script. - remove vNICs using Windows VMSwitch API rather than calling ovs-vsctl commands, this can remove the dependency on the running OVS userspace process (ovs-vswitchd). Signed-off-by: wenyingd <wenyingd@vmware.com>
@wenyingd @XinShuYang is there any Jenkins test that uses this script and that we should run before merging? |
@antoninbas No, all windows testbeds use snapshot to clean the environment. |
/skip-all |
The existing script would reset ovs-vswitchd when stuck in "starting" status. In some corner cases, ovsdb-server/ovs-ovswitchd services may be removed unexpectedly. This change includes, - add improvement to recover ovsdb-server/ovs-vswitched service if they do not exist when running the cleanup script. - remove vNICs using Windows VMSwitch API rather than calling ovs-vsctl commands, this can remove the dependency on the running OVS userspace process (ovs-vswitchd). Signed-off-by: wenyingd <wenyingd@vmware.com>
The existing script would reset ovs-vswitchd when stuck in "starting" status. In some corner cases, ovsdb-server/ovs-ovswitchd services may be removed unexpectedly.
This change includes,
Fix: #4721