Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support handling the error status of SFP modules #80

Merged

Conversation

stephenxs
Copy link
Collaborator

  1. Update get_change_event
    • Return event in a bitmap format in case there is an error
    • Return an extra map indexed by sfp_error in case there is a Mellanox-specific error
    • Indicate whether the error is a blocking error
  2. Expose the error status of SFP modules to CLI
    • A platform API calls SDK API to fetch the error status
      • Display the SFP module state: plugged, unplugged and plugged with error.
      • The SFP error state is displayed only if the SFP module state is plugged with error
    • The CLI will call platform API and provide user-friendly output
  3. Redirect stderr to /dev/null in order to eliminate errors in case SFP module is not plugged
  4. Beautify the code, making all the state/error/descriptions defined in a uniform place

Signed-off-by: Stephen Sun stephens@nvidia.com

Why I did it

How I did it

How to verify it

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012

Description for the changelog

A picture of a cute animal (not mandatory but encouraged)

@@ -17,6 +17,7 @@ class MockSxFd(object):
new_sx_fd_t_p = MagicMock(return_value=MockSxFd())
new_sx_user_channel_t_p = MagicMock()
from sonic_py_common.logger import Logger
from sonic_platform.sfp import SFP
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from .sfp import SFP

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

0x5: SFP.SFP_ERROR_BIT_UNSUPPORTED_CABLE,
0x6: SFP.SFP_ERROR_BIT_HIGH_TEMP,
0x7: SFP.SFP_ERROR_BIT_BAD_CABLE,
0x1: 0x00010000,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add comments to these numbers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with constant definition.

7: cls.SFP_ERROR_DESCRIPTION_BAD_CABLE,
8: cls.SFP_MLNX_ERROR_DESCRIPTION_PMD_TYPE_NOT_ENABLED,
12: cls.SFP_MLNX_ERROR_DESCRIPTION_PCIE_POWER_SLOT_EXCEEDED,
255: cls.SFP_STATUS_OK
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean cls.SFP_STATUS_OK? 255 is also a value that can be returned from PMAOS register. It's defined in PRM.

platform/mellanox/mlnx-platform-api/tests/test_sfp.py Outdated Show resolved Hide resolved
1. Update get_change_event
   - Return event in a bitmap format in case there is an error
   - Return an extra map indexed by sfp_error in case there is a Mellanox-specific error
   - Indicate whether the error is a blocking error
2. Expose the error status of SFP modules to CLI
   - A platform API calls SDK API to fetch the error status
     - Display the SFP module state: plugged, unplugged and plugged with error.
     - The SFP error state is displayed only if the SFP module state is plugged with error
   - The CLI will call platform API and provide user-friendly output
3. Redirect stderr to /dev/null in order to eliminate errors in case SFP module is not plugged
4. Beautify the code, making all the state/error/descriptions defined in a uniform place

Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
@stephenxs stephenxs marked this pull request as ready for review June 7, 2021 11:55
@Junchao-Mellanox Junchao-Mellanox merged commit cb5135f into Junchao-Mellanox:sfp-bit-map Jun 8, 2021
@stephenxs stephenxs deleted the sfp-bit-map-enhance branch June 29, 2021 05:43
Junchao-Mellanox pushed a commit that referenced this pull request Mar 14, 2022
e56e9b4 Fix CVE-2021-3121 warning (#96)
bf1be4f [ci]: Support code diff coverage threshold 50% (#94)
64e516c Ported Marvell armhf build on x86 for debian buster to use cross-compilation instead of qemu emulation (#80)
e426388 [ci]: Support azp code coverage (#87)
Junchao-Mellanox pushed a commit that referenced this pull request May 6, 2022
e56e9b4 Fix CVE-2021-3121 warning (#96)
bf1be4f [ci]: Support code diff coverage threshold 50% (#94)
64e516c Ported Marvell armhf build on x86 for debian buster to use cross-compilation instead of qemu emulation (#80)
e426388 [ci]: Support azp code coverage (#87)
Junchao-Mellanox pushed a commit that referenced this pull request Jun 27, 2022
[sonic-linkmgrd][master] Submodule Update

9c8a16e Jing Zhang Sun Jun 5 08:27:07 2022 -0700 Separate I2C mux state probing and gRPC forwarding state probing (#86)
491c4ee Longxiang Lyu Sun Jun 5 23:26:19 2022 +0800 Fix peer mux wait back off factor (#84)
a0b6b14 Jing Zhang Wed Jun 1 10:33:12 2022 -0700 Revert "Update log level for mux probing and mux state chance (#23)" (#85)
3c2b546 Jing Zhang Tue May 31 10:14:42 2022 -0700 Add default route support to active-active state machine (#78)
6fa892e Longxiang Lyu Fri May 27 09:15:06 2022 +0800 Degrade LinkProberStateMachineBase virtual function logging level (#80)
7b695ca Longxiang Lyu Fri May 27 09:14:02 2022 +0800 Fix mux wait timer and peer mux wait timer (#81)
Junchao-Mellanox pushed a commit that referenced this pull request Aug 17, 2022
linkmgrd:
* 3c2b546 2022-05-31 | Add default route support to `active-active` state machine (#78) (github/202205, master, 202205) [Jing Zhang]
* 6fa892e 2022-05-27 | Degrade `LinkProberStateMachineBase` virtual function logging level (#80) [Longxiang Lyu]
* 7b695ca 2022-05-27 | Fix mux wait timer and peer mux wait timer (#81) [Longxiang Lyu]

platform-daemons:
* 0d90023 2022-05-31 | grpc client implementation for active-active dualtor (sonic-net#248) (github/master, github/202205, master, 202205) [vdahiya12]
* 6b8bf69 2022-05-27 | [ycabled] Fix some syntax warnings in ycabled (sonic-net#263) [vdahiya12]
* 2bcf936 2022-05-24 | [ycabled] fix the posting for mux_cable_static_info per downlink when ycabled is spawned; synchronizing executing Telemetry API (sonic-net#257) [vdahiya12]
* ce217c0 2022-04-25 | Include changes from xcvr_api in transceiver_info table (sonic-net#253) [qinchuanares]
* e0f8a35 2022-04-22 | Fix checkReplyType failed issue via recreating xcvr_table_helper on forking subprocess (sonic-net#255) [Stephen Sun]

platform-common:
* f575a40 2022-05-24 | [Credo][Ycable] changes for synchronizing executing Telemetry API's when mux toggle is inprogress (sonic-net#280) (github/202205, master, 202205) [vdahiya12]
* b043372 2022-05-11 | [sonic_ssd] Nokia-7215: "show platform ssdhealth" not showing health percent (sonic-net#279) [bill-nokia]
* d62d3d6 2022-05-04 | [CMIS]Fix low-power to high power mode transition (sonic-net#268) [Prince George]
* f918125 2022-05-02 | [syseeprom] Enable display of vendor extension TLV content (sonic-net#270) [dflynn-Nokia]
* 4e08440 2022-04-14 | [Credo][Ycable] improve logging for Server Powered off/Faulty cables (sonic-net#272) [vdahiya12]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Junchao-Mellanox pushed a commit that referenced this pull request Jun 25, 2023
…atically (sonic-net#15544)

src/sonic-telemetry

*   0b8843c - (HEAD -> 202205, origin/202205) Merge pull request #120 from zbud-msft/202205_divide_by_zero (3 hours ago) [Ying Xie]
| *   015defa - Merge branch '202205' into 202205_divide_by_zero (5 hours ago) [Zain Budhwani]
* | de2124b - Change log level (5 hours ago) [zbud-msft]
* | f203be5 - Add logs for md5 checksum (#80) (5 hours ago) [Zain Budhwani]
| *   ea6c84b - Merge branch '202205' into 202205_divide_by_zero (31 hours ago) [Zain Budhwani]
* | ab98380 - Fix sonic-mgmt-common version to ec32690 in pipeline (#123) (34 hours ago) [Sachin Holla]
* 5fcecef - Merge branch '202205' into 202205_divide_by_zero (4 days ago) [Ying Xie]
* 09c8bfc - Merge branch '202205' into 202205_divide_by_zero (11 days ago) [Zain Budhwani]
* 21b9bc8 - Fix crash when retrieving cpu utilization (#70) (#71) (11 days ago) [Zain Budhwani]
Junchao-Mellanox pushed a commit that referenced this pull request Mar 13, 2024
…e latest HEAD automatically (sonic-net#18168)

#### Why I did it
src/wpasupplicant/sonic-wpa-supplicant
```
* d41110905 - (HEAD -> master, origin/master, origin/HEAD) Merge pull request #80 from wumiaont/master (4 days ago) [Kamil Cudnik]
* 7406f4ba4 - Fix compile issues with debian bookworm (10 days ago) [wumiaont]
* caed4ef71 - Fix compile issues with bookworm and openssl 3.0 (10 days ago) [wumiaont]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Junchao-Mellanox pushed a commit that referenced this pull request May 16, 2024
…utomatically (sonic-net#18900)

#### Why I did it
src/sonic-host-services
```
* aa84129 - (HEAD -> master, origin/master, origin/HEAD) Updated tacacs test (#123) (17 hours ago) [ycoheNvidia]
* 9e6404c - Add LDAP feature support (#80) (6 days ago) [davidpil2002]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants