Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mellanox] Optimize thermal policies #113

Closed
wants to merge 8 commits into from
Closed

Conversation

Junchao-Mellanox
Copy link
Owner

Why I did it

How I did it

How to verify it

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106

Description for the changelog

A picture of a cute animal (not mandatory but encouraged)

stephenxs and others added 7 commits December 1, 2021 09:47
…onic-net#9133)

- Why I did it
This is to update the common sonic-buildimage infra for reclaiming buffer.

- How I did it
Render zero_profiles.j2 to zero_profiles.json for vendors that support reclaiming buffer
The zero profiles will be referenced in PR [Reclaim buffer] Reclaim unused buffers by applying zero buffer profiles sonic-net#8768 on Mellanox platforms and there will be test cases to verify the behavior there.
Rendering is done here for passing azure pipeline.
Load zero_profiles.json when the dynamic buffer manager starts
Generate inactive port list to reclaim buffer

Signed-off-by: Stephen Sun <stephens@nvidia.com>
…c-net#9258)

- Why I did it
When PSU is powered off, the PSU is still on the switch and the air flow is still the same. In this case, it is not necessary to set FAN speed to 100%.

- How I did it
When PSU is powered of, don't treat it as absent.

- How to verify it
Adjust existing unit test case
Add new case in sonic-mgmt
    [8522f4f] Don't handle buffer pool watermark during warm reboot reconciling (sonic-net#1987)
  9514857 [config reload][202106] Update command reference (sonic-net#1944)
saibcm_modules_dnx submodule update contains a fix for kernel crash
…2010.1152 (sonic-net#9431)

- Why I did it
To include latest fixes.
SAI
* Reduce verbosity of warning message on shared memory already existing
* accuflow allocation support by key value

SDK
* Under various circumstances, Ethernet ports falsely showed that InfiniBand cables were connected.
* In SN4600C, at times, the link up time in both DAC and optics cables may, in the worst case, take up to 15 seconds.
* Using SN4600C with copper or optics loopback cables in NRZ speeds, link may raise in long link up times
* When ECMP has high amount of next-hops based on VLAN interfaces, in some rare cases, packets will get a wrong VLAN tag and will be dropped.
* When connecting Spectrum devices with optical transceivers that support RXLOS, remote side port down might cause the switch firmware to get stuck and cause unexpected switch behavior.
* Aggregation event is missing for WJH L2 drop reason 'Unicast egress port list is empty'.
* Tying the SCL and SDA of the optical modules to 3.3V causes errors.
* On SN4600, there was a delay of more than 10 seconds from the time a data packet is sent from CPU until it is transmitted through one of the switch ports.
* While using SN4600C system with Finisar FTLC1157RGPL 100GbE CWDM4 modules, intermittent link flaps across multiple ports may be observed.
* In Spectrum-2 and Spectrum-3 systems, link did not work in auto-negotiation when connected to Marvell PHY. KR mechanism has been enhanced to integrate with Marvell PHY. 
* The tunnel counter counts the drop packets now for Spectrum-2 and Spectrum-3 and consistent with Spectrum behavior and count the ECN dropped packets as well.
* When connecting SN3800 to Cisco-9000, fast-linkup flow will fail and will rise in the normal flow.
* Race condition in WJH library: when multiple threads load the LAG shared memory concurrently, the program may crash.
* Add WJH L2 drop reason 'Unicast egress port list is empty' as a new drop reason. 
* Fixed a memory leak in sx_api_port_sflow_statistics_get API. 
* During initialization flow, the command interface that is used by the minimal driver and SDK caused the collision in the firmware since the same buffer is used in the firmware for the two interfaces.

- How I did it
Updated SDK/SAI submodule and relevant makefiles with the required versions.

- How to verify it
Build an image and run tests from "sonic-mgmt".

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
@Junchao-Mellanox Junchao-Mellanox deleted the fix-thermal branch June 12, 2023 04:39
Junchao-Mellanox pushed a commit that referenced this pull request Jul 13, 2023
…lly (sonic-net#15520)

#### Why I did it
src/sonic-gnmi
```
*   01fe667 - (HEAD -> master, origin/master, origin/HEAD) Merge pull request #134 from FengPan-Frank/fenpan_dialout_rename (3 days ago) [Feng-msft]
|\  
| * 994c69c - Rename --enable-dialout option into ENABLE_DIALOUT to follow the convention. (3 days ago) [Feng Pan]
|/  
* a9126da - Update makefile to support armhf (#132) (3 days ago) [ganglv]
* 0d80c0d -  prevent potential panic: return immediately if there exists error (#113) (7 days ago) [Mai Bui]
*   3c0fca3 - Merge pull request #131 from FengPan-Frank/fenpan_dialout (7 days ago) [Feng-msft]
|\  
| * c3d3266 - Add build flag into gnmi as --enable-dialout. (8 days ago) [Feng Pan]
|/  
* fd78c42 - add semgrep (#126) (2 weeks ago) [Mai Bui]
* 214fa1c - TranslClient: Use new translib subscription APIs (#122) (3 weeks ago) [Sachin Holla]
* 87d8eb3 - (origin/202305) TranslClient: use PathValidator to sanitize the request paths (#112) (3 weeks ago) [Sachin Holla]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Junchao-Mellanox pushed a commit that referenced this pull request Dec 4, 2023
…omatically (sonic-net#17331)

#### Why I did it
src/sonic-mgmt-common
```
* d96bfcd - (HEAD -> master, origin/master, origin/HEAD) YANG tree generator and linter (#113) (6 hours ago) [faraazbrcm]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Junchao-Mellanox pushed a commit that referenced this pull request May 10, 2024
…utomatically (sonic-net#18475)

#### Why I did it
src/sonic-host-services
```
* e93494c - (HEAD -> master, origin/master, origin/HEAD) Update sonic-host-services-data.determine-reboot-cause.service (#119) (2 days ago) [Xincun Li]
* 15762a5 - Fix UT test data due to timestamp break. (#117) (4 days ago) [Feng-msft]
* d53f431 - [caclmgrd]Fix bfd and vxlan acl rules programming in acl table update scenario (#114) (13 days ago) [Sudharsan Dhamal Gopalarathnam]
* f2dbf25 - Add unittest for caclmgrd default deny rule (#113) (2 weeks ago) [Zhijian Li]
* bfa06c7 - Change dependency option to fix buildimage issue. (#110) (3 weeks ago) [Feng-msft]
* ba78bdb - Fix hostcfgd crash when delete entire config table. (#106) (4 weeks ago) [Hua Liu]
* 6130886 - Update ProcessStats query by using API instead of parsing ps command. (#103) (4 weeks ago) [Feng-msft]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants