Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smartswitch Platform Test Plan Document #12701

Merged
merged 45 commits into from
Sep 9, 2024

Conversation

nissampa
Copy link
Contributor

@nissampa nissampa commented May 2, 2024

Description of PR

The smartSwitch is a next generation of data center switch for T0/T1 roles, that now subsumes the DPU. This PR describes test cases to validate additional platform management functions such FPD, Console, Power mgmt., Health, Software upgrade, Life-cycle scenarios needed due to the presence of these DPUs in the system.

PR Link to Test Case Scripts: #14152

Back port request

  • 201911
  • 202012
  • 202205
  • 202305
  • 202311

Copy link

linux-foundation-easycla bot commented May 2, 2024

@nissampa nissampa marked this pull request as draft May 2, 2024 20:51
@nissampa nissampa changed the title DPU Test Plan Document Smartswitch Test Plan Document May 3, 2024
@r12f r12f requested review from r12f and zjswhhh May 3, 2024 22:44
@r12f r12f requested a review from prgeor May 3, 2024 22:49
@KrisNey-MSFT
Copy link

We reviewed the test cases today 5/8/2024. One comment, please change to SONiC-DASH OS for the DPU, and SONiC only for the CPU/NPU :)

@nissampa nissampa marked this pull request as ready for review May 9, 2024 18:31
* The "show reboot-cause history module-name" CLI on the switch shows the history of the specified module
* Use `config chassis modules shutdown <DPU_Number>`
* Use `config chassis modules startup <DPU_Number>`
* Wait for 5 minutes for Pmon to update the dpu states
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to wait 5 minutes? what is the max time until the dpu states is updated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Considering power on dpu, service to be up on dpu and chassis db update, we had given the 5 mins to be max limit.
  • This time limit is for initial boot up case and for subsequent operation state updates are going to be instantaneous.

@nissampa nissampa changed the title Smartswitch Test Plan Document Smartswitch Platform Test Plan Document May 14, 2024
### 1.8 Check the NTP date and timezone between DPU and NPU

#### Steps
* In Switch, under the file /etc/ntp.conf configure it to use the ntp server and restart ntp.service to configure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NTP configuration should be set via config DB. An example of the configuration is in https://github.com/sonic-net/SONiC/blob/master/doc/ntp/ntp-design.md HLD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. This test case is just to check that both NPU and DPU are in sync with the dates. Nothing to do with any configurations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please update the steps? Because the first step here is describes that the configuration will be set:

under the file /etc/ntp.conf configure it to use the ntp server and restart ntp.service

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it.


#### Steps
* In Switch, under the file /etc/ntp.conf configure it to use the ntp server and restart ntp.service to configure
* In DPU, similarly under the ntp configuration use the switches ip as ntp server and restart ntp service to configure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that SONiC on the switch should run NTP server? The support of the NTP servers is not yet integrated into the SONiC. It should be possible to configure the NTP server via Linux config files but this configuration might conflict with the NTP client configuration that SONiC supports.

If we want to run the NTP server on the switch we need to discuss this with the Microsoft team. @prgeor can you please assist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is nothing to do with any configuration. This is to check just the date and time zones are all same both on host and dpus. Changed the test case as such.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please update the steps? The steps tell opposite

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it.

@nissampa
Copy link
Contributor Author

nissampa commented Aug 7, 2024

"The smartSwitch is a next generation of data center switch for T0/T1 roles" is the smartSwitch a T0 roles in the next generation of data center?
@nissampa
Can you reply the comment?

The default will be T1 roles. There is a plan to use them as T0 switches as well.

@KrisNey-MSFT
Copy link

hi @prgeor - are you ok to merge this one?

bpar9
bpar9 previously approved these changes Aug 13, 2024
Copy link
Contributor

@bpar9 bpar9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@KrisNey-MSFT
Copy link

@prgeor is OOF for a few days, is it ok if we wait until he returns @nissampa ?

@nissampa
Copy link
Contributor Author

@prgeor is OOF for a few days, is it ok if we wait until he returns @nissampa ?

Sure. Just checking in to see when can this be merged ?

@KrisNey-MSFT
Copy link

@prgeor is OOF for a few days, is it ok if we wait until he returns @nissampa ?

Sure. Just checking in to see when can this be merged ?

I heard @prgeor is back this week, and is going to take a look.

@prgeor
Copy link
Contributor

prgeor commented Aug 20, 2024

@nissampa SONiC does not have BMC support. Can you remove BCM from the PR description?

@nissampa
Copy link
Contributor Author

BMC

@nissampa SONiC does not have BMC support. Can you remove BCM from the PR description?

Removed it.

root@sonic:/home/cisco# show chassis modules status
Name Description Physical-Slot Oper-Status Admin-Status Serial
------ ------------- --------------- ------------- -------------- --------
DPU0 N/A -1 Online up N/A
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nissampa Please update the output of the CLI

  1. Physical slot should be NA for DPU
  2. Serial Number should not be NA at least not when DPU is online
  3. Description should be "Data Processing Unit"

DPUX N/A -1 Online up N/A
```
#### Pass/Fail Criteria
* Verify number of DPUs from api and number of DPUs shown in the cli output.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nissampa the source of truth about number of DPUs should ideally come from
Ansible inventory file for the testbed. We cannot rely upon the APIs which are under test.

The inventory file should specify how many DPUs are expected in a testbed.

https://github.com/sonic-net/sonic-mgmt/blob/master/tests/common/utilities.py#L341 parses the inventory.

Example. System eeprom info for a testbed is fetched from inventory here:-https://github.com/sonic-net/sonic-mgmt/blob/master/tests/platform_tests/api/test_chassis.py#L249

Copy link
Contributor Author

@nissampa nissampa Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the source of truth to be picked from inventory file instead of from the api.

root@sonic:/home/cisco#

```
#### Pass/Fail Criteria
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nissampa What is expected for DPU that are offline/admin down?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will display the o/p as 0 for DPUs that are offline/admin down.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nissampa can you capture the output for this CLI on the DPU . This will set the expectation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in the mail thread that we are focussing on NPU side first.

```
On Switch:

root@sonic:/home/cisco# show platform voltage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nissampa What is the expected output when this CLI runs on the DPU host?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will display the ones that are respective to those DPU host.

@nissampa nissampa requested a review from prgeor August 22, 2024 22:15
@nissampa nissampa requested a review from prgeor August 27, 2024 20:25
@prgeor prgeor merged commit 94e5525 into sonic-net:master Sep 9, 2024
4 checks passed
@prgeor
Copy link
Contributor

prgeor commented Sep 9, 2024

@nissampa could you please update the PR description with the corresponding sonic-mgmt code PR?

@nissampa
Copy link
Contributor Author

nissampa commented Sep 9, 2024

@nissampa could you please update the PR description with the corresponding sonic-mgmt code PR?

Updated it.

@KrisNey-MSFT
Copy link

Woot!

hdwhdw pushed a commit to hdwhdw/sonic-mgmt that referenced this pull request Sep 20, 2024
* Create DPU-test-plan.md

* Rename DPU-test-plan.md to Smartswitch-test-plan.md

* Update Smartswitch-test-plan.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants