Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calulcate subport values #71

Merged
merged 2 commits into from
Jul 12, 2024
Merged

Conversation

chiourung
Copy link
Collaborator

Why I did it

Merge the PR from community to support subport for CMIS breakout.
sonic-net/sonic-buildimage#18204
sonic-net/sonic-buildimage#18499

How I did it

How to verify it

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205

Description for the changelog

Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

tshalvi and others added 2 commits July 9, 2024 02:48
- Why I did it
The field 'subport' represents the index of the split port within a physical port. For example, if a port is split into 4, the subport of the first logical port is 1, the subport of the second logical port is 2, and so on.
In xcvrd, the CMIS manager uses the subport to calculate the lane mask, which is used to control the data path per lane. In Nvidia platform, the subport is missing and is always set to 0. According to the xcvrd code, when subport=0, it will always correspond to the first logical port. Therefore, if we shut down any logical port that is not the first one, we will see the operational status of the first logical port also becomes down.
This PR aims to add the subport field to CONFIG DB and prevent such scenarios. This is applicable only for static default breakout mode. For DPB, subport calculation will happen on the fly (changes are not in Sonic yet).
(Subport HLD: HLD of subport: [link to the HLD document])

- How I did it
I have added the 'subport' field to all relevant Nvidia hwsku.json files (minigraph generation is based on them). Additionally, I introduced the new 'subport' field to portconfig.py, so that sonic-cfggen will be able to generate the minigraph with it. In this file, I also fixed an error that caused all attributes from hwsku.json to be applied only to the first logical ports associated with a physical port.
Furthermore, I updated hwsku_json_checker to include the new field and applied a fix to the sample_hwsku.json file. sample_hwsku.json is the file that sonic-config-engine's unit tests rely on for its tests. Previously, it only included attributes for the first logical port of a split physical port. For example, if Ethernet4, a 4-lane port, was split into 2 ports, then sample_hwsku.json included only the entry for Ethernet4, with no entry for Ethernet6. This misalignment with the structure of other hwsku.json files has been corrected as well.

- How to verify it
Ensure that each logical port has the correct value of 'subport' in CONFIG DB, and that shutting down a logical port affects only that port and not other ports in the split.
Automatically populate the correct subport value for an interface and insert into config_db.json

Why I did it
Problem 1
Breakouts of optic CMIS modules are broken. For example when in breakout mode of 4x100G on a port the /3, /5, /7 are not correctly associating with lanes (3,4), (5,6) and (7,8) of the transceiver respectively. The /3, /5, /7 are all trying to reprogram lanes (1,2).

In the xcvrd logs we see the following lane masks for all 4 interfaces for a port in 4x100G-2 mode:

EthernetX/X: Setting host_lanemask=0x3
EthernetX/X: Setting media_lanemask=0x1
As a result, we only see the /1 interface linking up correctly, the other 3 interfaces do not have the correct application advertisement + lanemasks applied. This is what we should expect to see:

lanes (1,2)
EthernetX/1: Setting host_lanemask=0x3
EthernetX/1: Setting media_lanemask=0x1

lanes (3,4)
EthernetX/3: Setting host_lanemask=0xc
EthernetX/3: Setting media_lanemask=0x2

lanes (5,6)
EthernetX/5: Setting host_lanemask=0x30
EthernetX/5: Setting media_lanemask=0x4

lanes (7,8)
EthernetX/7: Setting host_lanemask=0xc0
EthernetX/7: Setting media_lanemask=0x8
The problem is that “subport” is expected and used in xcvrd, xcvrd unit tests and explained in the Port Breakout HLD ; it was not implemented and does not end up in DB as expected, which defaults this value to 0.

(Pdb) self.port_dict["Ethernet194"]
{'index': 25, 'speed': '100000', 'lanes': '451,452', 'admin_status': 'up', 'cmis_state': 'INSERTED', 'cmis_retries': 0, 'cmis_expired': None}
Problem 2
This Pull Request does not play well with all HWSKUs that define breakouts. It does a lookup for every child port in the hwsku.json file - There should be 1 default breakout mode for each front panel port - this change expects every child port in a front panel port to have an entry - which causes an exception. Right now as far as I can tell this will break a lot of HWSKUs in a breakout config.

For example this is what the Dynamic Port Breakout feature uses on an Arista SKU to define 2x200G on the first front panel port:

"Ethernet0": {
     "default_brkout_mode": "2x200G[100G,50G,40G,25G,10G]"
},
but it currently expects:

"Ethernet0": {
    ...
},
"Ethernet4": {
    ...
},
How I did it
Calculate subport value automatically based on the number of child ports used per front panel port and insert it into CONFIG_DB for the interface. If that number is 1 for the front panel, subport 0 is used. If has more than 1 I.e. is in breakout mode use a 1 based subport number as the feature expects. For example:

1x400G-8:

Ethernet0: subport 0
4x100G-2:

Ethernet0: subport 1
Ethernet2: subport 2
Ethernet4: subport 3
Ethernet6: subport 4
I have kept it compatible with Mellanox change - if you want to define a different subport number in your hwsku.json that is fine and it will override the dynamically generated default.

How to verify it
I verified this by checking the output of sonic-cfggen, updating config_db. and checking that both the host and media lane masks were correct for a 400GBASE-DR4 xcvr in both 2x200G-4 mode and 4x100G-2 mode.

All links came up and xcvrd prints the correct lanemasks in logs.

I also made sure that portconfig.py no longer causes an exception; it only checks for the optional HWSKU value if childports are present in hwsku.json file (this is to make sure we don't break Mellanox hwsku's)

I also changed the unit test for config_db generation and ensured the correct values match. The test passes.
chaoskao pushed a commit to chaoskao/sonic-buildimage-ec that referenced this pull request Jul 11, 2024
…utomatically (#15992)

#### Why I did it
src/sonic-host-services
```
* 6767bc7 - (HEAD -> master, origin/master, origin/HEAD) [FeatureD] Move the Feature related config from Hostcfgd into a new daemon (edge-core#71) (6 days ago) [Vivek]
```
#### How I did it
#### How to verify it
#### Description for the changelog
@bryan1978 bryan1978 merged commit 4554ea0 into edge-core:202311.X Jul 12, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants