Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calulcate subport values and fix key error in hwsku.json #18499

Merged
merged 10 commits into from
May 17, 2024
Merged

Calulcate subport values and fix key error in hwsku.json #18499

merged 10 commits into from
May 17, 2024

Conversation

bobbymcgonigle
Copy link
Contributor

Automatically populate the correct subport value for an interface and insert into config_db.json

Why I did it

Problem 1

Breakouts of optic CMIS modules are broken. For example when in breakout mode of 4x100G on a port the /3, /5, /7 are not correctly associating with lanes (3,4), (5,6) and (7,8) of the transceiver respectively. The /3, /5, /7 are all trying to reprogram lanes (1,2).

In the xcvrd logs we see the following lane masks for all 4 interfaces for a port in 4x100G-2 mode:

EthernetX/X: Setting host_lanemask=0x3
EthernetX/X: Setting media_lanemask=0x1

As a result, we only see the /1 interface linking up correctly, the other 3 interfaces do not have the correct application advertisement + lanemasks applied. This is what we should expect to see:

lanes (1,2)
EthernetX/1: Setting host_lanemask=0x3
EthernetX/1: Setting media_lanemask=0x1

lanes (3,4)
EthernetX/3: Setting host_lanemask=0xc
EthernetX/3: Setting media_lanemask=0x2

lanes (5,6)
EthernetX/5: Setting host_lanemask=0x30
EthernetX/5: Setting media_lanemask=0x4

lanes (7,8)
EthernetX/7: Setting host_lanemask=0xc0
EthernetX/7: Setting media_lanemask=0x8

The problem is that “subport” is expected and used in xcvrd, xcvrd unit tests and explained in the Port Breakout HLD ; it was not implemented and does not end up in DB as expected, which defaults this value to 0.

(Pdb) self.port_dict["Ethernet194"]
{'index': 25, 'speed': '100000', 'lanes': '451,452', 'admin_status': 'up', 'cmis_state': 'INSERTED', 'cmis_retries': 0, 'cmis_expired': None}

Problem 2

This Pull Request does not play well with all HWSKUs that define breakouts. It does a lookup for every child port in the hwsku.json file - There should be 1 default breakout mode for each front panel port - this change expects every child port in a front panel port to have an entry - which causes an exception. Right now as far as I can tell this will break a lot of HWSKUs in a breakout config.

For example this is what the Dynamic Port Breakout feature uses on an Arista SKU to define 2x200G on the first front panel port:

"Ethernet0": {
     "default_brkout_mode": "2x200G[100G,50G,40G,25G,10G]"
},        

but it currently expects:

"Ethernet0": {
    ...
},    
"Ethernet4": {
    ...
},     

How I did it

Calculate subport value automatically based on the number of child ports used per front panel port and insert it into CONFIG_DB for the interface. If that number is 1 for the front panel, subport 0 is used. If has more than 1 I.e. is in breakout mode use a 1 based subport number as the feature expects. For example:

1x400G-8:

Ethernet0: subport 0

4x100G-2:

Ethernet0: subport 1
Ethernet2: subport 2
Ethernet4: subport 3
Ethernet6: subport 4

I have kept it compatible with Mellanox change - if you want to define a different subport number in your hwsku.json that is fine and it will override the dynamically generated default.

How to verify it

I verified this by checking the output of sonic-cfggen, updating config_db. and checking that both the host and media lane masks were correct for a 400GBASE-DR4 xcvr in both 2x200G-4 mode and 4x100G-2 mode.

All links came up and xcvrd prints the correct lanemasks in logs.

I also made sure that portconfig.py no longer causes an exception; it only checks for the optional HWSKU value if childports are present in hwsku.json file (this is to make sure we don't break Mellanox hwsku's)

I also changed the unit test for config_db generation and ensured the correct values match. The test passes.

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305
  • 202311

Tested branch (Please provide the tested image version)

  • 202305
  • 202311
  • master

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Change calculates the correct subport value for each interface.
It also updates unit tests for correct output
@prgeor
Copy link
Contributor

prgeor commented May 13, 2024

@bobbymcgonigle there is merge conflict

@prgeor
Copy link
Contributor

prgeor commented May 13, 2024

@liat-grozovik please have this reviewed for your platform

prgeor
prgeor previously approved these changes May 13, 2024
Copy link
Contributor

@prgeor prgeor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bobbymcgonigle Thanks for taking care of DPB and static DPB cases.

@dgsudharsan dgsudharsan requested a review from vivekrnv May 13, 2024 18:05
@prgeor
Copy link
Contributor

prgeor commented May 15, 2024

@bobbymcgonigle please take care of the build failures

prgeor
prgeor previously approved these changes May 16, 2024
@lguohan
Copy link
Collaborator

lguohan commented May 16, 2024

@bobbymcgonigle , vstest keeps failing, can you check?

@lguohan lguohan merged commit 2c69277 into sonic-net:master May 17, 2024
19 checks passed
@bobbymcgonigle bobbymcgonigle deleted the master-calculate-subport branch May 17, 2024 21:07
prsunny added a commit that referenced this pull request May 22, 2024
@prsunny
Copy link
Contributor

prsunny commented May 22, 2024

@bobbymcgonigle , please look into #19041

@lguohan
Copy link
Collaborator

lguohan commented Jun 4, 2024

@bobbymcgonigle , there is conflict for picking to 202311, can you create a cherry-pick PR?

lguohan pushed a commit that referenced this pull request Jun 5, 2024
This pull request merges #18499 to 202311

Automatically populate the correct subport value for an interface and insert into config_db.json

How to verify it
I verified this by checking the output of sonic-cfggen, updating config_db. and checking that both the host and media lane masks were correct for a 400GBASE-DR4 xcvr in both 2x200G-4 mode and 4x100G-2 mode.

All links came up and xcvrd prints the correct lanemasks in logs.

I also made sure that portconfig.py no longer causes an exception; it only checks for the optional HWSKU value if childports are present in hwsku.json file (this is to make sure we don't break Mellanox hwsku's)

I also changed the unit test for config_db generation and ensured the correct values match. The test passes.
smagarwal-arista pushed a commit to smagarwal-arista/sonic-buildimage that referenced this pull request Jun 14, 2024
…9216)

This pull request merges sonic-net#18499 to 202305

Automatically populate the correct subport value for an interface and insert into config_db.json

How to verify it
I verified this by checking the output of sonic-cfggen, updating config_db. and checking that both the host and media lane masks were correct for a 400GBASE-DR4 xcvr in both 2x200G-4 mode and 4x100G-2 mode.

All links came up and xcvrd prints the correct lanemasks in logs.

I also made sure that portconfig.py no longer causes an exception; it only checks for the optional HWSKU value if childports are present in hwsku.json file (this is to make sure we don't break Mellanox hwsku's)

I also changed the unit test for config_db generation and ensured the correct values match. The test passes.
MuLinForest pushed a commit to MuLinForest/sonic-buildimage that referenced this pull request Aug 6, 2024
…8499)

Automatically populate the correct subport value for an interface and insert into config_db.json

Why I did it
Problem 1
Breakouts of optic CMIS modules are broken. For example when in breakout mode of 4x100G on a port the /3, /5, /7 are not correctly associating with lanes (3,4), (5,6) and (7,8) of the transceiver respectively. The /3, /5, /7 are all trying to reprogram lanes (1,2).

In the xcvrd logs we see the following lane masks for all 4 interfaces for a port in 4x100G-2 mode:

EthernetX/X: Setting host_lanemask=0x3
EthernetX/X: Setting media_lanemask=0x1
As a result, we only see the /1 interface linking up correctly, the other 3 interfaces do not have the correct application advertisement + lanemasks applied. This is what we should expect to see:

lanes (1,2)
EthernetX/1: Setting host_lanemask=0x3
EthernetX/1: Setting media_lanemask=0x1

lanes (3,4)
EthernetX/3: Setting host_lanemask=0xc
EthernetX/3: Setting media_lanemask=0x2

lanes (5,6)
EthernetX/5: Setting host_lanemask=0x30
EthernetX/5: Setting media_lanemask=0x4

lanes (7,8)
EthernetX/7: Setting host_lanemask=0xc0
EthernetX/7: Setting media_lanemask=0x8
The problem is that “subport” is expected and used in xcvrd, xcvrd unit tests and explained in the Port Breakout HLD ; it was not implemented and does not end up in DB as expected, which defaults this value to 0.

(Pdb) self.port_dict["Ethernet194"]
{'index': 25, 'speed': '100000', 'lanes': '451,452', 'admin_status': 'up', 'cmis_state': 'INSERTED', 'cmis_retries': 0, 'cmis_expired': None}
Problem 2
This Pull Request does not play well with all HWSKUs that define breakouts. It does a lookup for every child port in the hwsku.json file - There should be 1 default breakout mode for each front panel port - this change expects every child port in a front panel port to have an entry - which causes an exception. Right now as far as I can tell this will break a lot of HWSKUs in a breakout config.

For example this is what the Dynamic Port Breakout feature uses on an Arista SKU to define 2x200G on the first front panel port:

"Ethernet0": {
     "default_brkout_mode": "2x200G[100G,50G,40G,25G,10G]"
},
but it currently expects:

"Ethernet0": {
    ...
},
"Ethernet4": {
    ...
},
How I did it
Calculate subport value automatically based on the number of child ports used per front panel port and insert it into CONFIG_DB for the interface. If that number is 1 for the front panel, subport 0 is used. If has more than 1 I.e. is in breakout mode use a 1 based subport number as the feature expects. For example:

1x400G-8:

Ethernet0: subport 0
4x100G-2:

Ethernet0: subport 1
Ethernet2: subport 2
Ethernet4: subport 3
Ethernet6: subport 4
I have kept it compatible with Mellanox change - if you want to define a different subport number in your hwsku.json that is fine and it will override the dynamically generated default.

How to verify it
I verified this by checking the output of sonic-cfggen, updating config_db. and checking that both the host and media lane masks were correct for a 400GBASE-DR4 xcvr in both 2x200G-4 mode and 4x100G-2 mode.

All links came up and xcvrd prints the correct lanemasks in logs.

I also made sure that portconfig.py no longer causes an exception; it only checks for the optional HWSKU value if childports are present in hwsku.json file (this is to make sure we don't break Mellanox hwsku's)

I also changed the unit test for config_db generation and ensured the correct values match. The test passes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants