Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chassis][midplane] Add notification to Supervisor when LC is graceful reboot #3292

Merged
merged 2 commits into from
May 15, 2024

Conversation

mlok-nokia
Copy link
Contributor

@mlok-nokia mlok-nokia commented Apr 26, 2024

What I did

Modify the "sudo reboot" script to notify the Supervisor card by creating/inserting CHASSIS_MODULE_REBOOT_INFO_TABLE|LINE-CARD#" entry to CHASSIS_STATE_DB when reboot command is issued on the Linecard. This provides the sufficient information to allow Supervisor to log a proper message to address issue sonic-net/sonic-buildimage#18540

How I did it

Add a new function linecard_reboot_notity_supervisor() to the reboot script. If this platform is a linecard in a chassis, call sonic-db-cli to add a "CHASSIS_MODULE_REBOOT_INFO_TABLE|LINE-CARD#" to the CHASSIS_STATE_DB. This provides the information to chassisd on Supervisor card to log a proper message.
This PRs requires the following 2 PRs to address issue sonic-net/sonic-buildimage#18540 :
sonic-net/sonic-buildimage#18805
sonic-net/sonic-platform-daemons#480
sonic-net/sonic-buildimage#18862

This PR is needed by branch 202205

How to verify it

  1. Test expected log. Use the CLI command "sudo reboot" to reboot a linecard, then check the syslog on Supervisor. The below message is logged
Apr 25 19:44:40.818378 ixre-cpm-chassis7 WARNING pmon#chassisd: Expected: Module LINE-CARD0 lost midplane connectivity
  1. Test unepxpected log. Using "sudo /sbin/reboot" or reboot a linecard with any crash method, then ccheck the syslog on Supervusor. The below message is logged.
Apr 25 19:50:22.549416 ixre-cpm-chassis7 WARNING pmon#chassisd: Unexpected: Module LINE-CARD0 lost midplane connectivity
  1. Test the expexcted reboot with timeout case. Use the CLI command "sudo reboot" on linecard. and keep it down for more than 4 minutes. The below messages are logged.
Apr 25 01:25:53.877143 ixre-cpm-chassis7 WARNING sr_device_mgr: Unable to reach slot 1 (Linecard) via Midplane
Apr 25 01:25:58.402511 ixre-cpm-chassis7 WARNING pmon#chassisd: Module LINE-CARD0 went off-line!
Apr 25 01:26:01.658959 ixre-cpm-chassis7 WARNING pmon#chassisd: Expected: Module LINE-CARD0 lost midplane connectivity.
( 3 minutes after the first log)
Apr 25 01:29:10.259527 ixre-cpm-chassis7 WARNING pmon#chassisd: Unexpected: Module LINE-CARD0 midplane connectivity is not restored in 180 seconds

Previous command output (if the output of a command-line utility has changed)

NA

New command output (if the output of a command-line utility has changed)

NA

…l reboot

Signed-off-by: mlok <marty.lok@nokia.com>
@mlok-nokia
Copy link
Contributor Author

@deepak-singhal0408 @judyjoseph This PR is for an issue of logging lost midplane connectivity log. Total 3 PRs. Please review them. Thanks

@abdosi
Copy link
Contributor

abdosi commented Apr 30, 2024

@bmridul please help review.

@abdosi abdosi requested a review from bmridul April 30, 2024 16:22
…ntry in CHASSIS_STATE_DB

Signed-off-by: mlok <marty.lok@nokia.com>
@gechiang
Copy link
Contributor

gechiang commented May 3, 2024

@mlok-nokia ,
What is the dependency of this PR with "sonic-net/sonic-platform-daemons#480"?
If let's say we backport this to .msft repo 202205 branch but not the platorm-daemons PR (480), will there be any build issue or functionality issue? Reason I am asking this is because I don't think "sonic-net/sonic-platform-daemons#480" will be allowed to 202205 branch and since we don't have a .msft 202205 repo for this platform-deamons submodule, the complete bug fix will be incomplete for the community building with 202205... But we should be able tomake internal build with patch. Just want to make sure there are no negative impact to the rest of the community.
please confirm.
Thanks!

@mlok-nokia
Copy link
Contributor Author

@mlok-nokia , What is the dependency of this PR with "sonic-net/sonic-platform-daemons#480"? If let's say we backport this to .msft repo 202205 branch but not the platorm-daemons PR (480), will there be any build issue or functionality issue? Reason I am asking this is because I don't think "sonic-net/sonic-platform-daemons#480" will be allowed to 202205 branch and since we don't have a .msft 202205 repo for this platform-deamons submodule, the complete bug fix will be incomplete for the community building with 202205... But we should be able tomake internal build with patch. Just want to make sure there are no negative impact to the rest of the community. please confirm. Thanks!

There should not be any functionality impact. PR #3292 just create a entry in CHASSIS_STATE_DB for platform-daemon PR sonic-net/sonic-platform-daemons#480 to use. If Platform-daemon is not in the branch, The data in DB will not be used.

Copy link
Contributor

@judyjoseph judyjoseph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@judyjoseph
Copy link
Contributor

@kenneth-arista could you review as well

@rlhui rlhui merged commit 547d5ee into sonic-net:master May 15, 2024
7 checks passed
@rlhui rlhui added the p0 label May 15, 2024
@gechiang
Copy link
Contributor

gechiang commented May 15, 2024

MSFT ADO: 28074312
@StormLiangMS , @yxieca , Please help review/approve this BUG FIX for chassis for 202305 and 202311 branches.
Thanks!

mssonicbld pushed a commit to mssonicbld/sonic-utilities that referenced this pull request May 15, 2024
…l reboot (sonic-net#3292)

* [chassis][midplane] Add notification to Supervisor when LC is graceful reboot

* Address review comment by adding log message when failed to create wentry in CHASSIS_STATE_DB

Signed-off-by: mlok <marty.lok@nokia.com>
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202311: #3324

mssonicbld pushed a commit that referenced this pull request May 15, 2024
…l reboot (#3292)

* [chassis][midplane] Add notification to Supervisor when LC is graceful reboot

* Address review comment by adding log message when failed to create wentry in CHASSIS_STATE_DB

Signed-off-by: mlok <marty.lok@nokia.com>
rlhui pushed a commit to sonic-net/sonic-buildimage that referenced this pull request May 31, 2024
… for Nokia-IXR7250E platform (#18862)

This PR add the platform specified linecard_reboot_timeout value to the platform_evn.conf. It works PR sonic-net/sonic-platform-daemons#480 and sonic-net/sonic-utilities#3292 to address issue #18540

Signed-off-by: mlok <marty.lok@nokia.com>
@gechiang gechiang added the included in chassis for 202205 branch indicate that this PR got merged into the "chassis for 202205 branch" label Jun 13, 2024
@gechiang
Copy link
Contributor

@StormLiangMS , no more backport allowed for 202305?? can you help review/approve/deny the backport request to 202305?
Thanks!

arfeigin pushed a commit to arfeigin/sonic-utilities that referenced this pull request Jun 16, 2024
…l reboot (sonic-net#3292)

* [chassis][midplane] Add notification to Supervisor when LC is graceful reboot

* Address review comment by adding log message when failed to create wentry in CHASSIS_STATE_DB

Signed-off-by: mlok <marty.lok@nokia.com>
arun1355492 pushed a commit to arun1355492/sonic-buildimage that referenced this pull request Jul 26, 2024
… for Nokia-IXR7250E platform (sonic-net#18862)

This PR add the platform specified linecard_reboot_timeout value to the platform_evn.conf. It works PR sonic-net/sonic-platform-daemons#480 and sonic-net/sonic-utilities#3292 to address issue sonic-net#18540

Signed-off-by: mlok <marty.lok@nokia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

9 participants