Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Monit] Deprecate the feature of monitoring the critical processes by Monit #7676

Merged

Conversation

yozhao101
Copy link
Contributor

@yozhao101 yozhao101 commented May 21, 2021

Signed-off-by: Yong Zhao yozhao@microsoft.com

Why I did it

Currently we leveraged the Supervisor to monitor the running status of critical processes in each container and it is more reliable and flexible than doing the monitoring by Monit. So we removed the functionality of monitoring the critical processes by Monit.

How I did it

I removed the script process_checker and corresponding Monit configuration entries of critical processes.

How to verify it

I verified this on the device str-7260cx3-acs-1.

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • [x ] 202012

Description for the changelog

A picture of a cute animal (not mandatory but encouraged)

Monit.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
@jleveque
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jleveque
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

processes in streaming telemetry.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
@yozhao101 yozhao101 marked this pull request as ready for review June 1, 2021 18:11
yozhao101 added a commit to sonic-net/sonic-mgmt that referenced this pull request Jun 1, 2021
…2012 image (#3559)

What is the motivation for this PR?
Since Supervisord will replace Monit to do the monitoring of critical processes, this test needs skip the testbeds which were installed with 202012 or newer image version. At the same time, this test needs handle the error if the command sudo monit status 'lldp|lldpmgrd' returns the non-zero exit code.

I met the following error message when this PR (sonic-net/sonic-buildimage#7676) was tested on virtual testbed.

monit/test_monit_status.py::test_monit_status[vlab-03] PASSED            [ 50%]
monit/test_monit_status.py::test_monit_reporting_message[vlab-03] 
-------------------------------- live log call ---------------------------------
02:11:26 utilities.wait_until                     L0068 ERROR  | Exception caught while checking check_monit_last_output: IndexError('list index out of range',)
02:12:26 utilities.wait_until                     L0068 ERROR  | Exception caught while checking check_monit_last_output: IndexError('list index out of range',)
02:13:27 utilities.wait_until                     L0068 ERROR  | Exception caught while checking check_monit_last_output: IndexError('list index out of range',)
FAILED   

How did you do it?
I used the pytest_require(...) to skip the testbed which were installed 202012 or newer image version.

How did you verify/test it?
I verified this change on the testbed str-msn2700-03.

Any platform specific information?
N/A

Supported testbed topology if it's a new test case?
N/A
@yozhao101
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Collaborator

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@yozhao101 yozhao101 merged commit 1a3cab4 into sonic-net:master Jun 4, 2021
@yozhao101 yozhao101 deleted the remove_monitoring_processes_monit branch June 4, 2021 17:16
@qiluo-msft
Copy link
Collaborator

This could not be cleanly cherry-picked to 202012. Please submit another PR.

@yozhao101
Copy link
Contributor Author

This could not be cleanly cherry-picked to 202012. Please submit another PR.

I will submit a separate PR for 202012 branch.

judyjoseph added a commit to judyjoseph/sonic-buildimage that referenced this pull request Jun 29, 2021
judyjoseph added a commit that referenced this pull request Jun 29, 2021
Remove the references to file monit_syncd from docker-syncd-brcm-dnx, which got missed as the PR #7598 overlapped #7676
carl-nokia pushed a commit to carl-nokia/sonic-buildimage that referenced this pull request Aug 7, 2021
… Monit (sonic-net#7676)

Signed-off-by: Yong Zhao yozhao@microsoft.com

Why I did it
Currently we leveraged the Supervisor to monitor the running status of critical processes in each container and it is more reliable and flexible than doing the monitoring by Monit. So we removed the functionality of monitoring the critical processes by Monit.

How I did it
I removed the script process_checker and corresponding Monit configuration entries of critical processes.

How to verify it
I verified this on the device str-7260cx3-acs-1.
carl-nokia pushed a commit to carl-nokia/sonic-buildimage that referenced this pull request Aug 7, 2021
Remove the references to file monit_syncd from docker-syncd-brcm-dnx, which got missed as the PR sonic-net#7598 overlapped sonic-net#7676
vmittal-msft pushed a commit to vmittal-msft/sonic-mgmt that referenced this pull request Sep 28, 2021
…2012 image (sonic-net#3559)

What is the motivation for this PR?
Since Supervisord will replace Monit to do the monitoring of critical processes, this test needs skip the testbeds which were installed with 202012 or newer image version. At the same time, this test needs handle the error if the command sudo monit status 'lldp|lldpmgrd' returns the non-zero exit code.

I met the following error message when this PR (sonic-net/sonic-buildimage#7676) was tested on virtual testbed.

monit/test_monit_status.py::test_monit_status[vlab-03] PASSED            [ 50%]
monit/test_monit_status.py::test_monit_reporting_message[vlab-03] 
-------------------------------- live log call ---------------------------------
02:11:26 utilities.wait_until                     L0068 ERROR  | Exception caught while checking check_monit_last_output: IndexError('list index out of range',)
02:12:26 utilities.wait_until                     L0068 ERROR  | Exception caught while checking check_monit_last_output: IndexError('list index out of range',)
02:13:27 utilities.wait_until                     L0068 ERROR  | Exception caught while checking check_monit_last_output: IndexError('list index out of range',)
FAILED   

How did you do it?
I used the pytest_require(...) to skip the testbed which were installed 202012 or newer image version.

How did you verify/test it?
I verified this change on the testbed str-msn2700-03.

Any platform specific information?
N/A

Supported testbed topology if it's a new test case?
N/A
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants