Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[voq][chassis][dhcp_relay] swss.sh try to start the dhcp_relay service although it is masked #18829

Merged
merged 2 commits into from
Aug 21, 2024

Conversation

mlok-nokia
Copy link
Contributor

@mlok-nokia mlok-nokia commented Apr 30, 2024

Why I did it

on Master branch, dhcp_relay is not supported in VOQ chassis. It is disabled in the FEATURE table. But based on the dependency, swss.sh always call "systemctl start" it although it's service file has been masked/disabled. The following error is logged in syslog which causes the logAnalyze failed on some of the OC tests. Fix issue #18822

Work item tracking
  • Microsoft ADO (number only):

How I did it

Added code to swss.sh to check if service is disabled or not. If it is disabled, do not start the service although it is in the DPENDENT_LIST. This avoids the ERROR log shown up in the syslog file

How to verify it

  1. Reboot the system or execut the config reload. The following error messages should not be seen in the syslog file
Apr 27 23:15:14.472425 ixre-egl-board7 ERR systemctl[7299]: Failed to start dhcp_relay.service: Unit dhcp_relay.service is masked.
Apr 27 23:15:14.476897 ixre-egl-board7 ERR systemctl[7298]: Failed to start dhcp_relay.service: Unit dhcp_relay.service is masked.

Which release branch to backport (provide reason below if selected)

This issue only exist in Master branch which is using the newer version of kernel.

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305
  • 202405

Tested branch (Please provide the tested image version)

tested on Master branch.

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@rlhui rlhui requested a review from kellyyeh May 1, 2024 17:13
@rlhui
Copy link
Contributor

rlhui commented May 1, 2024

@kellyyeh please review this at earliest, thanks.

Copy link
Contributor

@wenyiz2021 wenyiz2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would like to keep an eye on this PR.

we have seen similar issue where sup has bgp disabled but not masked, and swss startup also starts bgp which caused an issue. there is #15734 to fix that.

this PR LGTM as a more general fix.

@mlok-nokia
Copy link
Contributor Author

@lguohan @kellyyeh Please help to review and merge this PR. Thanks

judyjoseph
judyjoseph previously approved these changes May 10, 2024
Copy link
Contributor

@judyjoseph judyjoseph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the only case is if the service is delayed and hostcfg sets the FEATURE state to enable/disable later -- which seems not the case with dhcp_relay service.

@yxieca yxieca requested a review from yaqiangz May 15, 2024 18:20
@wenyiz2021
Copy link
Contributor

hi @lguohan / @yxieca , could you help merge?

@yxieca
Copy link
Contributor

yxieca commented May 15, 2024

hi @lguohan / @yxieca , could you help merge?

This change is generally in the right direction. I am surprised that it only affected dhcp relay. @qiluo-msft , @saiarcot895 do you have other concerns for the change?

@wenyiz2021
Copy link
Contributor

hi @lguohan / @yxieca , could you help merge?

This change is generally in the right direction. I am surprised that it only affected dhcp relay. @qiluo-msft , @saiarcot895 do you have other concerns for the change?

we have seen similar issue where sup has bgp disabled but not masked, and swss startup also starts bgp which caused an issue. there was #15734 to fix that.
current PR fix looks more general

@saiarcot895
Copy link
Contributor

saiarcot895 commented May 15, 2024

For swss, there's also the docker-wait-any script that gets started as part of the wait command (this basically checks to see if one of the dependent containers have exited; if so, it brings down this container). That might need to be modified as well here?

files/scripts/swss.sh Outdated Show resolved Hide resolved
@mlok-nokia
Copy link
Contributor Author

For swss, there's also the docker-wait-any script that gets started as part of the wait command (this basically checks to see if one of the dependent containers have exited; if so, it brings down this container). That might need to be modified as well here?

For this particular issue, it only happens on the variable "DEPENDENT". The docker-wait-any is using a variable "MULTI_INST_DEPENDENT". No need to modify for docker-wait-any.

Copy link
Contributor

@yaqiangz yaqiangz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rlhui
Copy link
Contributor

rlhui commented Jun 19, 2024

@prabhataravind would you be able to review/approve, so that this PR could be merged? thanks.

…e althoug it is masked

Signed-off-by: mlok <marty.lok@nokia.com>
Copy link
Contributor

@prabhataravind prabhataravind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest change looks good to me.

@prabhataravind
Copy link
Contributor

prabhataravind commented Jun 26, 2024

@prabhataravind would you be able to review/approve, so that this PR could be merged? thanks.

@rlhui Reviewed and approved the latest diff.

/bin/systemctl start ${dep}
if [[ $dep == "dhcp_relay" ]]; then
state=$(is_feature_enabled $dep)
if [[ $state == "true" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After starting the service, should we enable the feature flag for this service?

@yaqiangz
Copy link
Contributor

Hi @yxieca are we good to merge this PR?

@yxieca yxieca merged commit c5a620b into sonic-net:master Aug 21, 2024
20 checks passed
matiAlfaro pushed a commit to Marvell-switching/sonic-buildimage that referenced this pull request Aug 21, 2024
…e although it is masked (sonic-net#18829)

* [voq][chassis][dhcp_relay] swss.sh try to start the dhcp_relay service althoug it is masked

Signed-off-by: mlok <marty.lok@nokia.com>
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Sep 6, 2024
…e although it is masked (sonic-net#18829)

* [voq][chassis][dhcp_relay] swss.sh try to start the dhcp_relay service althoug it is masked

Signed-off-by: mlok <marty.lok@nokia.com>
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202405: #20182

mssonicbld pushed a commit that referenced this pull request Sep 7, 2024
…e although it is masked (#18829)

* [voq][chassis][dhcp_relay] swss.sh try to start the dhcp_relay service althoug it is masked

Signed-off-by: mlok <marty.lok@nokia.com>
vvolam pushed a commit to vvolam/sonic-buildimage that referenced this pull request Sep 12, 2024
…e although it is masked (sonic-net#18829)

* [voq][chassis][dhcp_relay] swss.sh try to start the dhcp_relay service althoug it is masked

Signed-off-by: mlok <marty.lok@nokia.com>
@mlok-nokia mlok-nokia deleted the dhcp_relay_issue branch September 27, 2024 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.