Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mellanox] Optimize thermal policies #113

Closed
wants to merge 8 commits into from
Closed

Commits on Dec 1, 2021

  1. [Reclaim buffer] Common infrastructure update for reclaiming buffer (s…

    …onic-net#9133)
    
    - Why I did it
    This is to update the common sonic-buildimage infra for reclaiming buffer.
    
    - How I did it
    Render zero_profiles.j2 to zero_profiles.json for vendors that support reclaiming buffer
    The zero profiles will be referenced in PR [Reclaim buffer] Reclaim unused buffers by applying zero buffer profiles sonic-net#8768 on Mellanox platforms and there will be test cases to verify the behavior there.
    Rendering is done here for passing azure pipeline.
    Load zero_profiles.json when the dynamic buffer manager starts
    Generate inactive port list to reclaim buffer
    
    Signed-off-by: Stephen Sun <stephens@nvidia.com>
    stephenxs authored and judyjoseph committed Dec 1, 2021
    Configuration menu
    Copy the full SHA
    fa0ae42 View commit details
    Browse the repository at this point in the history
  2. [Mellanox] Fan speed should not be 100% when PSU is powered off (soni…

    …c-net#9258)
    
    - Why I did it
    When PSU is powered off, the PSU is still on the switch and the air flow is still the same. In this case, it is not necessary to set FAN speed to 100%.
    
    - How I did it
    When PSU is powered of, don't treat it as absent.
    
    - How to verify it
    Adjust existing unit test case
    Add new case in sonic-mgmt
    Junchao-Mellanox authored and judyjoseph committed Dec 1, 2021
    Configuration menu
    Copy the full SHA
    e4ff4d2 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    f5c847b View commit details
    Browse the repository at this point in the history
  4. Update sonic-swss submodule

        [8522f4f] Don't handle buffer pool watermark during warm reboot reconciling (sonic-net#1987)
    judyjoseph committed Dec 1, 2021
    Configuration menu
    Copy the full SHA
    41a2d3e View commit details
    Browse the repository at this point in the history

Commits on Dec 2, 2021

  1. Update sonic-utilities submodule

      9514857 [config reload][202106] Update command reference (sonic-net#1944)
    judyjoseph committed Dec 2, 2021
    Configuration menu
    Copy the full SHA
    70b24ad View commit details
    Browse the repository at this point in the history

Commits on Dec 4, 2021

  1. [broadcom]: update bcm dnx gpl module pointer (sonic-net#9442)

    saibcm_modules_dnx submodule update contains a fix for kernel crash
    smaheshm committed Dec 4, 2021
    Configuration menu
    Copy the full SHA
    3f3dceb View commit details
    Browse the repository at this point in the history

Commits on Dec 6, 2021

  1. [Mellanox] [202106] Update SAI to v1.20.0.1 and SDK/FW to v4.5.1156/v…

    …2010.1152 (sonic-net#9431)
    
    - Why I did it
    To include latest fixes.
    SAI
    * Reduce verbosity of warning message on shared memory already existing
    * accuflow allocation support by key value
    
    SDK
    * Under various circumstances, Ethernet ports falsely showed that InfiniBand cables were connected.
    * In SN4600C, at times, the link up time in both DAC and optics cables may, in the worst case, take up to 15 seconds.
    * Using SN4600C with copper or optics loopback cables in NRZ speeds, link may raise in long link up times
    * When ECMP has high amount of next-hops based on VLAN interfaces, in some rare cases, packets will get a wrong VLAN tag and will be dropped.
    * When connecting Spectrum devices with optical transceivers that support RXLOS, remote side port down might cause the switch firmware to get stuck and cause unexpected switch behavior.
    * Aggregation event is missing for WJH L2 drop reason 'Unicast egress port list is empty'.
    * Tying the SCL and SDA of the optical modules to 3.3V causes errors.
    * On SN4600, there was a delay of more than 10 seconds from the time a data packet is sent from CPU until it is transmitted through one of the switch ports.
    * While using SN4600C system with Finisar FTLC1157RGPL 100GbE CWDM4 modules, intermittent link flaps across multiple ports may be observed.
    * In Spectrum-2 and Spectrum-3 systems, link did not work in auto-negotiation when connected to Marvell PHY. KR mechanism has been enhanced to integrate with Marvell PHY. 
    * The tunnel counter counts the drop packets now for Spectrum-2 and Spectrum-3 and consistent with Spectrum behavior and count the ECN dropped packets as well.
    * When connecting SN3800 to Cisco-9000, fast-linkup flow will fail and will rise in the normal flow.
    * Race condition in WJH library: when multiple threads load the LAG shared memory concurrently, the program may crash.
    * Add WJH L2 drop reason 'Unicast egress port list is empty' as a new drop reason. 
    * Fixed a memory leak in sx_api_port_sflow_statistics_get API. 
    * During initialization flow, the command interface that is used by the minimal driver and SDK caused the collision in the firmware since the same buffer is used in the firmware for the two interfaces.
    
    - How I did it
    Updated SDK/SAI submodule and relevant makefiles with the required versions.
    
    - How to verify it
    Build an image and run tests from "sonic-mgmt".
    
    Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
    volodymyrsamotiy committed Dec 6, 2021
    Configuration menu
    Copy the full SHA
    b22db5a View commit details
    Browse the repository at this point in the history

Commits on Dec 7, 2021

  1. Configuration menu
    Copy the full SHA
    cb7ff47 View commit details
    Browse the repository at this point in the history