[pfcwd] For zero buffer pfcwd detection logic, verify forward action on Rx #5665

neethajohn · 2022-05-17T19:07:29Z

Signed-off-by: Neetha John nejo@microsoft.com

Description of PR

For platforms that use the zero buffer detection logic for pfcwd, modify the testcase to check for ingress traffic getting forwarded. Related to sonic-net/sonic-swss#2279

Summary:
Fixes # (issue)

Type of change

Bug fix
Testbed and Framework(new/improvement)
Test case(new/improvement)

Back port request

201911
202012

How did you verify/test it?

Ran the test with these changes on Mellanox platform and it passed

Signed-off-by: Neetha John <nejo@microsoft.com>

…n pfc storm is detected (#2279) What I did According to the current pfcwd detection logic on certain platforms, when wd fires, we create an ingress zero pool, ingress zero profile, egress zero pool, egress zero profile (if not already created) and then attach the ingress zero profile to the ingress pg and egress zero profile to the egress queue. As a result traffic ingressing that port/pg and egressing that port/queue will get dropped. The current changes are done to avoid dropping traffic that is ingressing the port/pg that is in storm. The code changes in this PR avoid creating the ingress zero pool and profile and not attach any zero profile to the ingress pg when pfcwd is triggered How I verified it Modified the pfcwd func tests (sonic-net/sonic-mgmt#5665) and testcase passed on Mellanox platform Signed-off-by: Neetha John <nejo@microsoft.com>

vivekrnv · 2022-06-21T17:48:14Z

Hi @neethajohn,

Looks like the test_pfcwd_warm_reboot.py has to be updated as well with the new drop actions on ingress.

pfcwd/test_pfcwd_warm_reboot.py::TestPfcwdWb::test_pfcwd_wb[no_storm] 
-------------------------------------------------------------------------------------------------------------- live log setup ---------------------------------------------------------------------------------------------------------------
15:15:16 INFO __init__.py:set_default:49: Completeness level not set during test execution. Setting to default level: CompletenessLevel.basic
15:15:16 INFO __init__.py:check_test_completeness:139: Test has no defined levels. Continue without test completeness checks
15:15:31 ERROR facts_cache.py:read:87: Load json file "/root/mars/workspace/sonic-mgmt/tests/_cache/arc-switch1004/basic_facts.json" failed with exception: IOError(2, 'No such file or directory')
15:15:36 INFO facts_cache.py:write:106: Create cache dir /root/mars/workspace/sonic-mgmt/tests/_cache/arc-switch1004
15:15:36 INFO facts_cache.py:write:112: Cached facts "arc-switch1004.basic_facts" under /root/mars/workspace/sonic-mgmt/tests/_cache/arc-switch1004
15:15:38 INFO ptfhost_utils.py:change_mac_addresses:86: Change interface MAC addresses on ptfhost 'ptf-1004'
15:15:40 INFO ptfhost_utils.py:copy_ptftests_directory:48: Copy PTF test files to PTF host 'ptf-1004'
15:16:05 INFO conftest.py:generate_params_dut_hostname:675: DUTs in testbed 'arc-switch1004-t0' are: ['arc-switch1004']
15:16:05 INFO conftest.py:creds:362: dut arc-switch1004 belongs to groups [u'lab', u'leaf_topo_1', u'sonic', u'sonic_latest', 'fanout']
15:16:05 INFO conftest.py:creds:374: skip empty var file ../ansible/group_vars/all/env.yml
15:16:05 INFO conftest.py:creds:374: skip empty var file ../ansible/group_vars/all/corefile_uploader.yml
15:16:08 INFO __init__.py:sanity_check:46: Start pre-test sanity check
15:16:08 INFO __init__.py:sanity_check:56: Found marker: m.name=disable_loganalyzer, m.args=(), m.kwargs={}
15:16:08 INFO __init__.py:sanity_check:56: Found marker: m.name=topology, m.args=('any',), m.kwargs={}
15:16:08 INFO __init__.py:sanity_check:90: Sanity check settings: skip_sanity=True, check_items=set(['monit', 'processes', 'interfaces', 'bgp', 'services', 'dbmemory']), allow_recover=True, recover_method=adaptive, post_check=False
15:16:08 INFO __init__.py:sanity_check:93: Skip sanity check according to command line argument or configuration of test script.
15:16:08 ERROR facts_cache.py:read:87: Load json file "/root/mars/workspace/sonic-mgmt/tests/_cache/arc-switch1004/mg_facts.json" failed with exception: IOError(2, 'No such file or directory')
15:16:09 INFO facts_cache.py:write:112: Cached facts "arc-switch1004.mg_facts" under /root/mars/workspace/sonic-mgmt/tests/_cache/arc-switch1004
15:16:14 INFO conftest.py:setup_pfc_test:101: --- Stopping Pfcwd ---
15:16:16 INFO __init__.py:loganalyzer:17: Log analyzer is disabled
15:16:16 INFO test_pfcwd_warm_reboot.py:setup_pfcwd:49: Setup the default pfcwd config for warm-reboot test
--------------------------------------------------------------------------------------------------------------- live log call ---------------------------------------------------------------------------------------------------------------
15:16:24 INFO test_pfcwd_warm_reboot.py:test_pfcwd_wb:547: --- Test PFC storm detect/restore before and after warm boot ---
15:16:24 INFO test_pfcwd_warm_reboot.py:pfcwd_wb_helper:486: 
15:16:24 INFO test_pfcwd_warm_reboot.py:pfcwd_wb_helper:487: --- Testing on Ethernet10 ---
15:16:24 INFO test_pfcwd_warm_reboot.py:setup_test_params:97: --- Setting up test params for port Ethernet10 ---
15:16:45 INFO test_pfcwd_warm_reboot.py:run_test:388: --- Storm detection path for port Ethernet10 queue 4 ---
15:16:52 INFO pfc_storm.py:start_storm:196: --- Starting PFC storm on arc-switch1005 on interfaces ethernet 1/3/2 on queue 4 ---
15:17:14 INFO test_pfcwd_warm_reboot.py:storm_detect_path:317: Verify if PFC storm is detected on port Ethernet10 queue 4
15:17:20 INFO test_pfcwd_warm_reboot.py:verify_wd_func:270: --- Verify PFCwd function for action drop ---
15:17:20 INFO test_pfcwd_warm_reboot.py:verify_tx_egress:216: Check for egress drop on Tx port Ethernet10
15:17:22 INFO test_pfcwd_warm_reboot.py:verify_rx_ingress:241: Check for ingress drop on Rx port Ethernet10






RunAnsibleModuleFail: run module shell failed, Ansible Results =>
E           {
E               "changed": true, 
E               "cmd": "ptf --test-dir ptftests pfc_wd.PfcWdTest --platform-dir ptftests --platform remote -t 'port_type='\"'\"'vlan'\"'\"';port_src=5;ip_dst=u'\"'\"'10.0.0.57'\"'\"';router_mac=u'\"'\"'ec:0d:9a:fa:c1:00'\"'\"';wd_action='\"'\"'drop'\"'\"';pkt_count=100;port_dst='\"'\"'[28]'\"'\"';queue_index=4' --relax --debug info --log-file /tmp/pfc_wd.PfcWdTest.2022-06-20-15:17:22.log", 
E               "delta": "0:00:00.941062", 
E               "end": "2022-06-20 15:17:23.132121", 
E               "failed": true, 
E               "invocation": {
E                   "module_args": {
E                       "_raw_params": "ptf --test-dir ptftests pfc_wd.PfcWdTest --platform-dir ptftests --platform remote -t 'port_type='\"'\"'vlan'\"'\"';port_src=5;ip_dst=u'\"'\"'10.0.0.57'\"'\"';router_mac=u'\"'\"'ec:0d:9a:fa:c1:00'\"'\"';wd_action='\"'\"'drop'\"'\"';pkt_count=100;port_dst='\"'\"'[28]'\"'\"';queue_index=4' --relax --debug info --log-file /tmp/pfc_wd.PfcWdTest.2022-06-20-15:17:22.log", 
E                       "_uses_shell": true, 
E                       "argv": null, 
E                       "chdir": "/root", 
E                       "creates": null, 
E                       "executable": null, 
E                       "removes": null, 
E                       "stdin": null, 
E                       "stdin_add_newline": true, 
E                       "strip_empty_ends": true, 
E                       "warn": true
E                   }
E               }, 
E               "msg": "non-zero return code", 
E               "rc": 1, 
E               "start": "2022-06-20 15:17:22.191059", 
E               "stderr": "/usr/local/lib/python2.7/dist-packages/paramiko/transport.py:33: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in the next release.\n  from cryptography.hazmat.backends import default_backend\npfc_wd.PfcWdTest ... FAIL\n\n======================================================================\nFAIL: pfc_wd.PfcWdTest\n----------------------------------------------------------------------\nTraceback (most recent call last):\n  File \"ptftests/pfc_wd.py\", line 113, in runTest\n    return verify_no_packet_any(self, masked_exp_pkt, dst_port_list)\n  File \"/usr/lib/python2.7/dist-packages/ptf/testutils.py\", line 2476, in verify_no_packet_any\n    verify_no_packet(test, pkt, (device, port))\n  File \"/usr/lib/python2.7/dist-packages/ptf/testutils.py\", line 2422, in verify_no_packet\n    \"port %r.\\n%s\" % (device, port, result.format()))\nAssertionError: Received packet that we expected not to receive on device 0, port 28.\n========== RECEIVED ==========\n0000   26 42 D5 7A 41 34 EC 0D  9A FA C1 00 08 00 45 11   &B.zA4........E.\n0010   00 56 00 01 00 00 3F 06  6F 56 01 01 01 01 0A 00   .V....?.oV......\n0020   00 39 49 E3 C3 98 00 00  00 00 00 00 00 00 50 02   .9I...........P.\n0030   20 00 79 EB 00 00 00 01  02 03 04 05 06 07 08 09    .y.............\n0040   0A 0B 0C 0D 0E 0F 10 11  12 13 14 15 16 17 18 19   ................\n0050   1A 1B 1C 1D 1E 1F 20 21  22 23 24 25 26 27 28 29   ...... !\"#$%&'()\n0060   2A 2B 2C 2D                                        *+,-\n==============================\n\n\n----------------------------------------------------------------------\nRan 1 test in 0.030s\n\nFAILED (failures=1)", 
E               "stderr_lines": [
E                   "/usr/local/lib/python2.7/dist-packages/paramiko/transport.py:33: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in the next release.", 
E                   "  from cryptography.hazmat.backends import default_backend", 
E                   "pfc_wd.PfcWdTest ... FAIL", 
E                   "", 
E                   "======================================================================", 
E                   "FAIL: pfc_wd.PfcWdTest", 
E                   "----------------------------------------------------------------------", 
E                   "Traceback (most recent call last):", 
E                   "  File \"ptftests/pfc_wd.py\", line 113, in runTest", 
E                   "    return verify_no_packet_any(self, masked_exp_pkt, dst_port_list)", 
E                   "  File \"/usr/lib/python2.7/dist-packages/ptf/testutils.py\", line 2476, in verify_no_packet_any", 
E                   "    verify_no_packet(test, pkt, (device, port))", 
E                   "  File \"/usr/lib/python2.7/dist-packages/ptf/testutils.py\", line 2422, in verify_no_packet", 
E                   "    \"port %r.\\n%s\" % (device, port, result.format()))", 
E                   "AssertionError: Received packet that we expected not to receive on device 0, port 28.", 
E                   "========== RECEIVED ==========", 
E                   "0000   26 42 D5 7A 41 34 EC 0D  9A FA C1 00 08 00 45 11   &B.zA4........E.", 
E                   "0010   00 56 00 01 00 00 3F 06  6F 56 01 01 01 01 0A 00   .V....?.oV......", 
E                   "0020   00 39 49 E3 C3 98 00 00  00 00 00 00 00 00 50 02   .9I...........P.", 
E                   "0030   20 00 79 EB 00 00 00 01  02 03 04 05 06 07 08 09    .y.............", 
E                   "0040   0A 0B 0C 0D 0E 0F 10 11  12 13 14 15 16 17 18 19   ................", 
E                   "0050   1A 1B 1C 1D 1E 1F 20 21  22 23 24 25 26 27 28 29   ...... !\"#$%&'()", 
E                   "0060   2A 2B 2C 2D                                        *+,-", 
E                   "==============================", 
E                   "", 
E                   "", 
E                   "----------------------------------------------------------------------", 
E                   "Ran 1 test in 0.030s", 
E                   "", 
E                   "FAILED (failures=1)"
E               ], 
E               "stdout": "", 
E               "stdout_lines": []

E }

…storm is detected (#2304) What I did Avoid dropping traffic that is ingressing the port/pg that is in storm. The code changes in this PR avoid creating the ingress zero pool and profile and does not attach any zero profile to the ingress pg when pfcwd is triggered Revert changes related to #1480 where the retry mechanism was added to BufferOrch which caches the task retries and while the PG is locked by PfcWdZeroBufferHandler. Revert changes related to #2164 in PfcWdZeroBufferHandler & ZeroBufferProfile & BufferOrch. Updated UT's accordingly How I verified it UT's. Ran the sonic-mgmt test with these changes sonic-net/sonic-mgmt#5665 and verified if they've passed. Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>

Signed-off-by: Neetha John <nejo@microsoft.com>

…on Rx (#5665) Signed-off-by: Neetha John <nejo@microsoft.com> For platforms that use the zero buffer detection logic for pfcwd, modify the testcase to check for ingress traffic getting forwarded. Related to sonic-net/sonic-swss#2279 How did you verify/test it? Ran the test with these changes on Mellanox platform and it passed

…storm is detected (sonic-net#2304) What I did Avoid dropping traffic that is ingressing the port/pg that is in storm. The code changes in this PR avoid creating the ingress zero pool and profile and does not attach any zero profile to the ingress pg when pfcwd is triggered Revert changes related to sonic-net#1480 where the retry mechanism was added to BufferOrch which caches the task retries and while the PG is locked by PfcWdZeroBufferHandler. Revert changes related to sonic-net#2164 in PfcWdZeroBufferHandler & ZeroBufferProfile & BufferOrch. Updated UT's accordingly How I verified it UT's. Ran the sonic-mgmt test with these changes sonic-net/sonic-mgmt#5665 and verified if they've passed. Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>

For cisco-8000 platforms, set forward action on Rx in presence of pfc-wd Change is made after: #5665

…storm is detected (sonic-net#2304) What I did Avoid dropping traffic that is ingressing the port/pg that is in storm. The code changes in this PR avoid creating the ingress zero pool and profile and does not attach any zero profile to the ingress pg when pfcwd is triggered Revert changes related to sonic-net#1480 where the retry mechanism was added to BufferOrch which caches the task retries and while the PG is locked by PfcWdZeroBufferHandler. Revert changes related to sonic-net#2164 in PfcWdZeroBufferHandler & ZeroBufferProfile & BufferOrch. Updated UT's accordingly How I verified it UT's. Ran the sonic-mgmt test with these changes sonic-net/sonic-mgmt#5665 and verified if they've passed. Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>

commit 0b6042544e8dcccdcd79a25c7748fd11b9bc27ad Author: siqbal1486 <shahzad.iqbal@microsoft.com> Date: Wed Aug 10 15:26:43 2022 -0700 changed suggested in review. cleanup commit 0cc1d72b7e0c5da97815fc0a69d12d2a0c2171a9 Merge: f6f02f03 6850440d Author: siqbal1486 <shahzad.iqbal@microsoft.com> Date: Wed Aug 10 14:31:28 2022 -0700 Merge branch 'bfd_test_multihop' of https://github.com/siqbal1986/sonic-mgmt into bfd_test_multihop commit 6850440d5f90a1e2ae0d78c2f2f42f9fc39b3c95 Merge: 5924f75c 93323578 Author: siqbal1986 <shahzad.iqbal@gmail.com> Date: Wed Aug 10 14:30:19 2022 -0700 Merge branch 'master' into bfd_test_multihop commit f6f02f036b767dc5012b55bee92f4f3944470083 Merge: f78da62f 5924f75c Author: siqbal1486 <shahzad.iqbal@microsoft.com> Date: Wed Aug 10 13:06:42 2022 -0700 Merge branch 'bfd_test_multihop' of https://github.com/siqbal1986/sonic-mgmt into bfd_test_multihop commit 9332357850282dd61dba5bbfbe68463dd088e91d Author: Jibin Bao <jbao@nvidia.com> Date: Thu Aug 11 00:09:18 2022 +0800 Add test plan for syslog source ip feature (#5943) commit fb51ba2b092ea48d9233e9f3efcc1811afef2668 Author: Nana@Nvidia <78413612+nhe-NV@users.noreply.github.com> Date: Thu Aug 11 00:06:43 2022 +0800 [Qos]TestQosSai should not be skipped on ptf32, ptf64 topo (#6112) - What is the motivation for this PR? For mellanox asic, TestQosSai support to run on ptf32,ptf64 topos, it should be skip on these topos - How did you do it? Add support for ptf32,ptf64 in tests/common/plugins/conditional_mark/tests_mark_conditions.yaml - How did you verify/test it? Run the TestQosSai on ptf topo, and it is not skipped. Change-Id: I6d37aca287e8e797ae43de903920fb61c2e1ae9c commit 22ac478a87f1c81643e6733cb3090e8d5f696d9e Author: Ashwin Srinivasan <93744978+assrinivasan@users.noreply.github.com> Date: Wed Aug 10 08:56:03 2022 -0700 Removed the superfluous pdb trace command from the get_healthy_psu_num function in test_platform_info (#6135) commit 22fb68f8ade261eabaf323ba85ec63028d324d75 Author: Cong Hou <97947969+congh-nvidia@users.noreply.github.com> Date: Wed Aug 10 23:16:45 2022 +0800 [sub-interface] use OrderedDict instead of built-in dict for ptf and dut ports in get_port() function of sub-interface test (#6125) The function get_port() in tests/sub_port_interfaces/sub_ports_helpers.py is using built-in dictionary to store the dut ports and ptf ports selected for the subinterface test. However, because there's no order in the built-in dict, sometimes the dut port could be paired with a wrong ptf port, which will cause the test to fail. In the function get_ports() the dut ports is returned in dict and the ptf ports is returned in list of the dict values, and they are zipped in the caller to do iteration. It is not guaranteed that when zipping, the dut port is paired with the correct ptf port. For example in tests/sub_port_interfaces/conftest.py So need to use OrderedDict instead of built-in dictitonary to store the selected dut ports and ptf port in get_ports(). commit f3748cfef4bca1604037ab116586c7b33a2c8b81 Author: Yutong Zhang <90831468+yutongzhang-microsoft@users.noreply.github.com> Date: Wed Aug 10 20:49:59 2022 +0800 [bugfix] skip vlan/test_vlan_ping.py (#6137) Description of PR In pr #5708 , we skip the test cases in tests/common/plugins/conditional_mark/tests_mark_conditions.yaml. There is a merge conflict and forget to skip vlan/test_vlan_ping.py when the asic_type is broadcom. In this pr, skip this module. What is the motivation for this PR? In pr #5708 , we skip the test cases in tests/common/plugins/conditional_mark/tests_mark_conditions.yaml. There is a merge conflict and forget to skip vlan/test_vlan_ping.py when the asic_type is broadcom. In this pr, skip this module. How did you do it? Add the condition to skip vlan/test_vlan_ping.py. Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com> commit b1f80d1cc63b092f280e6ed3d6e80da251b1fbed Author: Kostiantyn Yarovyi <kostiantynx.yarovyi@intel.com> Date: Wed Aug 10 13:03:29 2022 +0200 add sleep after remove vrf (#6133) What is the motivation for this PR? vrf does not have enough time to remove before a creation. Therefore a test TestVrfDeletion::test_vrf1_neigh_after_restore failed How did you do it? add sleep How did you verify/test it? run vrf/test_vrf.py::TestVrfDeletion::test_vrf1_neigh_after_restore commit 5b9a30c112c08b611c9212efdc30262c31ce7cd1 Author: Yutong Zhang <90831468+yutongzhang-microsoft@users.noreply.github.com> Date: Wed Aug 10 16:33:16 2022 +0800 Restore tacacs_server after the module tacacs/test_accounting.py running. (#6117) Description of PR In module tacacs/test_accounting.py, the fixture check_tacacs use the function setup_tacacs_client to delete the default tacacs server, and set the ptf mgmt ip as tacacs sever ip. But it doesn't restore this config when the module finish running. We want to keep the config in consistent before and after the testcase running, so fix it. What is the motivation for this PR? In module tacacs/test_accounting.py, the fixture check_tacacs use the function setup_tacacs_client to delete the default tacacs server, and set the ptf mgmt ip as tacacs sever ip. But it doesn't restore this config when the module finish running. We want to keep the config in consistent before and after the testcase running, so fix it. How did you do it? Get the default tacacs server and put them into a list, when the module finish running, delete the ptf mgmt ipand restore the default tacacs server ip. How did you verify/test it? Running the test cases in this module and compare the tacacs server ip before and after running. Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com> commit a60e5a6e1d32ea7fa104046d36aae8d1ba707dd9 Author: Zhaohui Sun <94606222+ZhaohuiS@users.noreply.github.com> Date: Wed Aug 10 15:02:15 2022 +0800 Add StartTimestamp column in TestReportPipeline table (#6132) What is the motivation for this PR? Add StartTimestamp column in TestReportPipeline table How did you do it? Use another API to get the start time of pipeline and upload it to Kusto. How did you verify/test it? python3 collect_azp_results.py 8888 python3 report_uploader.py -c "test_result" -e "vms-t0-kvm.201911#132728" -t "vms-t0-kvm" -i "http://****/sonic-broadcom.bin" results SonicTestData Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com> commit 5514acb1a07a61284ce481d788274ad5d2a9ac18 Author: Ihor Chekh <ichekh@nvidia.com> Date: Wed Aug 10 00:42:20 2022 +0300 BFD test fixes and improvements (#6082) *Single hop BFD test fixes and improvements commit f78da62f1afc481fd5b38dc716fce18163a88625 Merge: 3ea15a68 c309ff26 Author: siqbal1986 <shahzad.iqbal@gmail.com> Date: Tue Aug 9 13:01:16 2022 -0700 Merge branch 'master' into bfd_test_multihop commit 5924f75c2db3dbad70c6373f989fdb20b74345d4 Merge: 3ea15a68 c309ff26 Author: siqbal1986 <shahzad.iqbal@gmail.com> Date: Tue Aug 9 13:01:16 2022 -0700 Merge branch 'master' into bfd_test_multihop commit c309ff26b1a4dd1783782f68aeffe464ca68f463 Author: Ye Jianquan <jianquanye@microsoft.com> Date: Tue Aug 9 17:25:36 2022 +0800 [TestbedV2]Convert t1-lag pr test to TestbedV2 (#6127) Convert t1-lag pr test to TestbedV2 Approach What is the motivation for this PR? Convert the t1-lag pr test to TestbedV2, to reduce test time by distributing test cases on multi-instances. Currently, the preparation of the testbed(add-topo, deploy-mg) is operated implicitly, before the testbed is ready, the progress of the test plan keeps 0. We will refine the progress indicator in a future release. The conversion can be dynamically reverted by modifying an AZP library variable: Testbed-Tools/RUN_TEST_BY_SCHEDULER : YES/NO How did you do it? Modify the pipeline yaml file. After converting to TestbedV2, the AZP only create the test plan and poll the result of the test plan. How did you verify/test it? The pass result of this pr is the test result of this pr. Signed-off-by: Jianquan Ye<jianquanye@microsoft.com> commit c3f124f34a37e1fca93aab2e912b418cbb084841 Author: ppikh <70200079+ppikh@users.noreply.github.com> Date: Tue Aug 9 06:20:48 2022 +0300 Fixed dut_basic_facts ansible module to have support SONiC images which does not have attribute "is_supervisor" (#6118) Description of PR Fixed dut_basic_facts ansible module to have support SONiC images which does not have attribute "is_supervisor" Previoulsy when we call dut_basic_facts ansible module on SONiC image which does not have attribute "is_supervisor"(for example: 202012) we received error: AttributeError("'module' object has no attribute 'is_supervisor'" Now issue fixed - script will work on all SONiC branches Issue introduced in PR: #5708 Summary: Fixed dut_basic_facts ansible module to have support SONiC images which does not have attribute "is_supervisor" What is the motivation for this PR? Fix AttributeError("'module' object has no attribute 'is_supervisor'" How did you do it? See code How did you verify/test it? Executed ansible module: dut_basic_facts Signed-off-by: Petro Pikh <petrop@nvidia.com> commit 5ee2f0cd3237114bd4bc0b0cc910dac2788f8123 Author: Ze Gan <ganze718@gmail.com> Date: Tue Aug 9 10:11:39 2022 +0800 Revert "[kvmtest.sh]: Ignore test_t0_sonic temporarily (#6104)" (#6119) This reverts commit d3bc674964cf4244994bb204b37e0c19e140ca10. commit f46f36171819265a3514c48509e8e8b685593ae2 Author: Yutong Zhang <90831468+yutongzhang-microsoft@users.noreply.github.com> Date: Tue Aug 9 07:57:36 2022 +0800 [bugfix] Fix an error in tests_mark_conditions.yaml (#6113) Description of PR There is a condition error in tests_mark_conditions.yaml, fix it. What is the motivation for this PR? There is a condition error in tests_mark_conditions.yaml, fix it. Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com> commit bf02aeae1d6c8e7d4523e21ec5fdf8e04688b689 Author: Lawrence Lee <lawlee@microsoft.com> Date: Mon Aug 8 14:51:06 2022 -0700 [dualtor]: Resolve neighbor after neighbor removal (#6071) - After restarting arp_responder during the test, also restart arp_update process on the DUT to resolve failed neighbor entries - Improve test case cleanup Signed-off-by: Lawrence Lee <lawlee@microsoft.com> commit 41d7b15524017f9a267ff010846d5a1f9681b307 Author: Nana@Nvidia <78413612+nhe-NV@users.noreply.github.com> Date: Mon Aug 8 20:34:45 2022 +0800 Add rif loopback action test plan (#5956) Add test plan for the RIF interface loopback action feature. The HLD for the RIF interface loopback action: https://github.com/sonic-net/SONiC/blob/master/doc/ip-interface/loopback-action/ip-interface-loopback-action-design.md commit 24dad8f4b7036376b873c4a0a71a5b7d8a649be8 Author: Yutong Zhang <90831468+yutongzhang-microsoft@users.noreply.github.com> Date: Mon Aug 8 11:01:03 2022 +0800 Remove "BGP_BBR" in config after test case test_bbr_disabled_dut_asn_in_aspath running. (#6102) Description of PR During the test case test_bbr_disabled_dut_asn_in_aspath running, it changes the status of "BGP_BBR" in config db. But before running, there is no "BGP_BBR" in config db. This cause the inconsistent in config db before and after the test case running. In this pr, we delete the key "BGP_BBR" in config db after the test case running. What is the motivation for this PR? During the test case test_bbr_disabled_dut_asn_in_aspath running, it changes the status of "BGP_BBR" in config db. But before running, there is no "BGP_BBR" in config db. This cause the inconsistent in config db before and after the test case running. In this pr, we delete the key "BGP_BBR" in config db after the test case running. How did you do it? Use configlet to delete the config after test case running. How did you verify/test it? Check the config db before and after the test case running. Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com> commit e011ed0ac2ea7e062bf4f6d55177ecfd4e907569 Author: Jing Zhang <zhangjing@microsoft.com> Date: Sun Aug 7 18:42:01 2022 -0700 Enable `test_normal_op` test cases on active-active dualtor interfaces (#5984) Approach What is the motivation for this PR? To enable dualtor io tests on active-active dualtor ports. How did you do it? 1. Added control utilities in nic_simulator_control, for toggling active-active interfaces to standby/active states on any or both duthosts. Toggles is triggered through cmd line, which is different to active-standby ports. 2. Added active-active type in cable_type fixture. 3. Update test_normal_op cases to adapt active-active interfaces. For some cases, disruption is not expected any more. 4. Adjust test names, commets, to better suit today's usage. How did you verify/test it? Run test cases on mixed topology. commit d3bc674964cf4244994bb204b37e0c19e140ca10 Author: Ze Gan <ganze718@gmail.com> Date: Sun Aug 7 18:41:38 2022 +0800 [kvmtest.sh]: Ignore test_t0_sonic temporarily (#6104) What is the motivation for this PR? There is a bug in vsonic as neighbor devices, ignore t0_sonic temporarily and added it back if the bug is fixed. How did you do it? add || ture in ./run_tests.sh to ignore the test result. Signed-off-by: Ze Gan <ganze718@gmail.com> commit 35a1f1e5b0fc788648dab9405ec2a478732ed99e Author: Xin Wang <xiwang5@microsoft.com> Date: Sat Aug 6 15:40:18 2022 +0800 Fix cEOS duplicated mac address issue on Ubuntu 22.04 (#6090) What is the motivation for this PR? If deploy a topology using cEOS, one of the steps is to create veth interfaces for the cEOS docker containers. For example, the current steps to create backplane interfaces: 1.1 Create veth pair in host for container VM0100 ip link add VM0100-back type veth peer name eth5 1.2 Add the eth5 interface to network namespace of container VM0100 2.1 Create veth pair in host for another container VM0101 ip link add VM0101-back type veth peer name eth5 2.2 Add the eth5 interface to network namespace of container VM0101 As we can see that after step 1.2, eth5 is no longer in the host namespace. Then in step 2.1 we can add another interface with same name eth5. The problem is that on Ubuntu 22.04, mac address of eth5 created in step 2.1 will be the same as the eth5 interface created in step 1.1. Possibly Ubuntu 22.04 is using a different algorithm for assigning mac address to new veth interfaces. If interface name is same, then mac address will be same too. Because all the VMxxxx-back interfaces will be attached to a same ovs bridge, their peer interfaces should not use same mac address. How did you do it? The fix is to create veth interfaces with unique name in host for all cEOS containers in the beginning. Then all the interfaces in different cEOS have unique mac address. How did you verify/test it? Tested using 'testbed-cli.sh remove-topo' and 'testbed-cli.sh add-topo' Signed-off-by: Xin Wang <xiwang5@microsoft.com> commit a3268ac644162c6feff44163f023a0df41ad337a Author: jingwenxie <jingwenxie@microsoft.com> Date: Sat Aug 6 05:26:55 2022 +0800 [tests/configlet] Remove ignore path in addrack test (#6088) Summary: Remove the ignore path that were blocked by YANG before. ### Approach #### What is the motivation for this PR? The ignore_path should be removed in apply-patch operation. #### How did you do it? Remove ignore_path. commit 180641d9fbcbe21af61b6c75364e7d799454649e Author: Vaibhav Hemant Dixit <vaibhav.dixit@microsoft.com> Date: Fri Aug 5 11:31:08 2022 -0700 [decap] Bug fix: add missing import for util function in test_decap (#6072) Bug fix: add missing import for util function in test_decap test_decap is failing on master branch due to a bug introduced by #5834 The changes were tested on 202012, but not on master where the json import did not exist. commit 8ac482562757308f2bc24608d423e3fddc477c06 Author: ShiyanWangMS <shiyanwang@microsoft.com> Date: Thu Aug 4 21:58:59 2022 -0700 Improve debug capability for testcase [test_ecn_during_decap_on_active] (#6091) What is the motivation for this PR? The testcase(test_ecn_during_decap_on_active) results are not stable. Sometime it will fail due to not receiving expected packets. And there is no useful debug information in log file. How did you do it? Add "portstat -c" before sending packets and add "portstat -j" after sending packets. Add "show arp" to quickly identify which is the RX/TX port. How did you verify/test it? Manually run the testcase without Python error. commit 4cac24854714eb8c52a682311bddbb454f3874ee Author: Ashwin Srinivasan <93744978+assrinivasan@users.noreply.github.com> Date: Thu Aug 4 09:59:14 2022 -0700 Adds a function to get the number of healthy PSUs in a device (#6060) commit 98752da2c0410dd31696d400651c516facce62c2 Author: Yutong Zhang <90831468+yutongzhang-microsoft@users.noreply.github.com> Date: Thu Aug 4 14:21:59 2022 +0800 Replace `pytest.skip` in test scripts with conditional marks. (#5708) Description of PR When we use pytest.skip in test scripts, it will first execute some fixtures in test cases, which will waste some time. When using conditional marks to skip test cases, it will skip the case in the collect period, which will not execute fixtures in test cases and save some execute time. In this pr, we replace pytest.skip in test scripts with conditional marks to skip test case in advance and save execute time. What is the motivation for this PR? When we use pytest.skip in test scripts, it will first execute some fixtures in test cases, which will waste some time. When using conditional marks to skip test cases, it will skip the case in the collect period, which will not execute fixtures in test cases and save some running time. In this pr, we replace pytest.skip in test scripts with conditional marks to skip test case in advance and save running time. How did you do it? Replace pytest.skip in test scripts with conditional marks in tests_mark_conditions.yaml. How did you verify/test it? By running whole test cases and observe the running time. Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com> commit 5eb6d94ad0ce3cb25e82b4f47f21044ddc83ce41 Author: Stephen Sun <5379172+stephenxs@users.noreply.github.com> Date: Thu Aug 4 10:51:01 2022 +0800 Fix issue: there should be one DSCP mapped to queue 2/6 in non dual-ToR scenarios (#6089) Signed-off-by: Stephen Sun <stephens@nvidia.com> commit 709d503a8f73ec6d31d9173d7c7701ae0f520ed9 Author: Sudharsan Dhamal Gopalarathnam <dgsudharsan@users.noreply.github.com> Date: Wed Aug 3 09:57:34 2022 -0700 [kvm]Avoid running ebtables test in KVM (#6073) *Avoid running ebtables test in KVM sonic-net/sonic-buildimage#11585 ebtables shouldn't be installed in KVM which blocks L2 forwarding. So removed the logic to install ebtables rules in SONiC. Hence removing the ebtables tests to be executed on KVM. commit c8ff3f0d69f558fee6b48c0d6510b5e7693ec8c2 Author: rraghav-cisco <58446052+rraghav-cisco@users.noreply.github.com> Date: Wed Aug 3 09:07:51 2022 -0700 Adding cisco-8000 to the list of platforms for forward action. (#6068) For cisco-8000 platforms, set forward action on Rx in presence of pfc-wd Change is made after: #5665 commit 96a460309c8c98038d075941b6f050cfb708c025 Author: Neetha John <nejo@microsoft.com> Date: Wed Aug 3 09:06:56 2022 -0700 [swap_syncd] Avoid bgp idle check since bgp docker is already down (#6083) Signed-off-by: Neetha John <nejo@microsoft.com> What is the motivation for this PR? With the latest image, qos sai tests are failing during test setup with the following error. This is due to the changes introduced in sonic-net/sonic-buildimage#11000. Since swss docker is already stopped prior to this check, bgp docker is stopped and hence the show commands will no longer work 02/08/2022 13:14:58 utilities.wait_until L0113 ERROR | Exception caught while checking ready_for_swap:Traceback (most recent call last): File "/azp/agent/_work/26/s/sonic-mgmt-int/tests/common/utilities.py", line 107, in wait_until check_result = condition(*args, **kwargs) File "/azp/agent/_work/26/s/sonic-mgmt-int/tests/common/system_utils/docker.py", line 186, in ready_for_swap not duthost.is_bgp_state_idle() File "/azp/agent/_work/26/s/sonic-mgmt-int/tests/common/devices/multi_asic.py", line 304, in is_bgp_state_idle return self.sonichost.is_bgp_state_idle() File "/azp/agent/_work/26/s/sonic-mgmt-int/tests/common/devices/sonic.py", line 1657, in is_bgp_state_idle bgp_summary = self.command("show ip bgp summary")["stdout_lines"] File "/azp/agent/_work/26/s/sonic-mgmt-int/tests/common/devices/base.py", line 89, in _run raise RunAnsibleModuleFail("run module {} failed".format(self.module_name), res) RunAnsibleModuleFail: run module command failed, Ansible Results => { "changed": true, "cmd": [ "show", "ip", "bgp", "summary" ], "delta": "0:00:00.881363", "end": "2022-08-02 13:14:58.004283", "failed": true, "invocation": { "module_args": { "_raw_params": "show ip bgp summary", "_uses_shell": false, "argv": null, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "stdin_add_newline": true, "strip_empty_ends": true, "warn": true } }, "msg": "non-zero return code", "rc": 2, "start": "2022-08-02 13:14:57.122920", "stderr": "Usage: show ip [OPTIONS] COMMAND [ARGS]...\nTry "show ip -h" for help.\n\nError: No such command "bgp".", "stderr_lines": [ "Usage: show ip [OPTIONS] COMMAND [ARGS]...", "Try "show ip -h" for help.", "", "Error: No such command "bgp"." ], "stdout": "", "stdout_lines": [] } How did you do it? Remove the bgp idle state check How did you verify/test it? Ran the qos sai testcase with the changes and it passed commit d6e332386641a3e5344220e7b06316c7acc40722 Author: jingwenxie <jingwenxie@microsoft.com> Date: Wed Aug 3 15:42:36 2022 +0800 [tests/override_config_table] Add empty table removal test (#6087) Summary: Add E2E test for empty table removal in Golden Config What is the motivation for this PR? We should have an agreement on how Golden Config removes initial table config. This test is to verify the empty table removal in the E2E test. How did you do it? Add E2E test for empty table removal in Golden Config. How did you verify/test it? kvm test. commit a3e2a8e280da76b33486b3bc4d7c89400bbdc373 Author: jingwenxie <jingwenxie@microsoft.com> Date: Wed Aug 3 15:41:50 2022 +0800 [GCU] Change to identical DUT for cacl test (#6086) Summary: Resolve dualTor cacl test failure. What is the motivation for this PR? Change to identical DUT for cacl test2 How did you do it? Change fixture from duthost to rand_selected_dut commit 185d46f90e2b829e9304a709b4ba6caa294040dd Author: bingwang-ms <66248323+bingwang-ms@users.noreply.github.com> Date: Wed Aug 3 15:13:59 2022 +0800 Support 4 lossless queues in `test_buffer_deployment` (#5921) * Support 4 lossless queues in test_buffer_deployment commit 7d1fb7a2b7d3fcd24a150d59aef4f02133f237c5 Author: Ze Gan <ganze718@gmail.com> Date: Wed Aug 3 15:08:40 2022 +0800 [loganalyzer]: Add ignore log for wpa_supplicant (#6075) What is the motivation for this PR? wpa_supplicant may get the following log ERR macsec#wpa_supplicant[15]: KaY: Life time has not elapsed since prior SAK distributed when the rekey action. But the mka session was finally be recovered otherwise the testcases will fail instead of log checker. How did you do it? Add r, ".* ERR macsec#wpa_supplicant.*KaY: Life time has not elapsed since prior SAK distributed.*" in loganalyzer_common_ignore.txt Signed-off-by: Ze Gan <ganze718@gmail.com> commit 6658606edff2694f4832fe06c69ad8ac5e1452d8 Author: Zhaohui Sun <94606222+ZhaohuiS@users.noreply.github.com> Date: Mon Aug 1 16:42:13 2022 +0800 Add test_lag_db_status and test_lag_db_status_with_po_update back (#6074) What is the motivation for this PR? Add test_lag_db_status and test_lag_db_status_with_po_update back which were reverted in #6058. How did you do it? Enhance these two cases to support 202012 kvm testbed, use wait_until instead of checking interface status immediately after shutdown or no shutdown. How did you verify/test it? Verified test_lag_db_status and test_lag_db_status_with_po_update on these testbeds: * kvm testbed with master image * kvm testbed with 202012 image * T0 * T1-lag * Dualtor Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com> commit 3ea15a685ed58cc93b78e8d9098ff687cd41c8fc Merge: 17720c67 fa259b22 Author: siqbal1986 <shahzad.iqbal@gmail.com> Date: Sun Jul 31 15:37:26 2022 -0700 Merge branch 'master' into bfd_test_multihop commit 17720c671dd125b6ef3237b28a893648b582afe1 Author: Shahzad Iqbal (SHAHZADIQBAL) <SHAHZADIQBAL@ame.gbl> Date: Sun Jul 31 15:21:47 2022 -0700 minor changes suggested in review. commit fa259b22248a957c20ff38abe5c1c82718389fe9 Author: Zhaohui Sun <94606222+ZhaohuiS@users.noreply.github.com> Date: Fri Jul 29 04:05:16 2022 +0800 Revert 2 test cases in test_lag_2 and reduce t1-lag running time (#6058) What is the motivation for this PR? test_lag_db_status_with_po_update can ran success on master image. But failed for image 202012. Because for 202012, there is no netdev_oper_status key in PORT_TABLE ofSTATE_DB, the test case chooses oper_status in APPL_DB , but sync time seems to be different, after shutdown, it checks the status of oper_status immediately, it should wait for a while until the status is correct. Revert these 2 cases firstly, will submit a new PR to add them after enough verification on 202012 and master image. How did you do it? Two changes in this PR: Revert test_lag_db_status and test_lag_db_status_with_po_update. Use --completeness_level=confident to reduce running time, it will pick up 4 ports, not all of them for t1-lag. How did you verify/test it? Run pc/test_lag_2.py Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com> commit 5153efb5b98cad7e21f61bcd01cc7b4cfcc84582 Author: Xin Wang <xiwang5@microsoft.com> Date: Thu Jul 28 16:11:29 2022 +0800 Disable CI testing keep only PR testing (#6061) What is the motivation for this PR? Currently the pipeline has both PR and CI testing enabled. After a PR is merged, the pipeline is triggered again. This is a waste of resource. How did you do it? This change disabled CI testing. The PR testing will be still triggered as usual. Signed-off-by: Xin Wang <xiwang5@microsoft.com> commit 15aa60774a529862c2c348f8af5435a3afc61d0a Author: bingwang-ms <66248323+bingwang-ms@users.noreply.github.com> Date: Thu Jul 28 14:52:01 2022 +0800 Remove 'active_tor_mac' from fixture dualtor_info (#6053) PR #5923 updated fixture dualtor_info to add active_tor_mac. However, it caused issue for some other test cases because the return dict is directly passed to check_tunnel_balance, and the function complained that ``` File "/azp/agent/_work/5/s/tests/dualtor/test_orchagent_standby_tor_downstream.py", line 204, in test_standby_tor_downstream_loopback_route_readded check_tunnel_balance(**params) TypeError: check_tunnel_balance() got an unexpected keyword argument 'active_tor_mac' ``` This PR addressed the issue by removing active_tor_mac from the fixture. The variable is added for test case test_encap_dscp_rewrite and test_bounced_back_traffic_in_expected_queue What is the motivation for this PR? This PR is to fix issue caused by updating fixture dualtor_info. How did you do it? Remove the newly added variable active_tor_mac from fixture dualtor_info. How did you verify/test it? The change is verified by running test cases test_standby_tor_downstream_loopback_route_readded and test_tunnel_qos_remap.py. commit e5daa2aa0c11c949290b2a5e576ad66798cbb81d Author: Liu Shilong <shilongliu@microsoft.com> Date: Thu Jul 28 14:37:42 2022 +0800 [ci] Transfer organization from Azure to sonic-net (#6059) Description of PR Summary: Transfer organization from Azure to sonic-net commit 60886753bd4e4ddb582438c9a0f09711699e792d Author: Andrii-Yosafat Lozovyi <andrii-yosafatx.lozovyi@intel.com> Date: Thu Jul 28 08:22:53 2022 +0300 Fix vrf "KeyError: 'target_dest_mac'" (#5785) Vrf tests cases fail with KeyError 'target_dest_mac', this issue started to appear after PR - 5456 Changes made in this PR should fix KeyError issue in vrf TC "fib_test.FibTest ... ERROR", "", "======================================================================", "ERROR: fib_test.FibTest", "----------------------------------------------------------------------", "Traceback (most recent call last):", " File \"ptftests/fib_test.py\", line 458, in runTest", " self.check_ip_ranges()", " File \"ptftests/fib_test.py\", line 163, in check_ip_ranges", " self.check_ip_range(ip_range, dut_index, ipv4)", " File \"ptftests/fib_test.py\", line 204, in check_ip_range", " self.check_ip_route(src_port, dst_ip, exp_ports, ipv4)", " File \"ptftests/fib_test.py\", line 237, in check_ip_route", " res = self.check_ipv4_route(src_port, dst_ip_addr, dst_port_list)", " File \"ptftests/fib_test.py\", line 267, in check_ipv4_route", " router_mac = self.ptf_test_port_map[str(src_port)]['target_dest_mac']", "KeyError: 'target_dest_mac'", Signed-off-by: Andrii-Yosafat Lozovyi <andrii-yosafatx.lozovyi@intel.com> commit 7c7d5a25ce7cf4a29820a6de86ea9227a62b64de Author: Soumya Velamala <87676006+svelamal@users.noreply.github.com> Date: Wed Jul 27 22:20:07 2022 -0700 Update ip_in_ip_tunnel_test.py (#5853) In test_orchagent_standby_tor_downstream.py, DF bit is set from Cisco-8000 silicon one ASIC since fragmentation on the encapsulated packet is not supported and the expected packet doesn't have it set. This causes the tests to fail despite receiving the complete expected packets. In reference to https://datatracker.ietf.org/doc/html/rfc2003, the outer packet can have the DF bit set when the inner packet does not. Identification, Flags, Fragment Offset These three fields are set as specified in [10]. However, if the "Don't Fragment" bit is set in the inner IP header, it MUST be set in the outer IP header; if the "Don't Fragment" bit is not set in the inner IP header, it MAY be set in the outer IP header, as described in Section 5.1. commit a5ce7f33d6800005b5146ea7a53743711c0d89de Author: slutati1536 <69785882+slutati1536@users.noreply.github.com> Date: Thu Jul 28 08:18:31 2022 +0300 Remove ONIE downgrade from fwutil tests (#5920) What is the motivation for this PR? We had a bug where the switch doesn't reboot back to sonic after ONIE update. Upon further investigation this bug reproduces on a downgrade from ONIE 5.3.007 to a prior version of ONIE. The cause for this bug is that the current ONIE release (ONIE 5.3.007) has a new E2FS package that is not compatible with the old E2FS package version that existed in previous ONIE versions. As we do not expect this flow to occur in production, only in testing, it was decided to remove the ONIE downgrade from fwutil tests. How did you do it? By removing the ONIE from the random components used in the tests. How did you verify/test it? Run the fwutil tests with the change commit 327a2850029347ecfb8b2f3aab316de60127acbc Author: lipxu <108326363+lipxu@users.noreply.github.com> Date: Thu Jul 28 13:14:55 2022 +0800 Fix test_buffer_deployment case NoneType issue (#6050) What is the motivation for this PR? Case test_buffer_deployment failed on the Broadcom devices. How did you do it? Init lossless headroom data for non-mellanox device How did you verify/test it? Re-run the failure case commit ff373cc764a6214e4401c3e9bdf497567e815be8 Author: Anton Ptashnik <antonx.ptashnik@intel.com> Date: Thu Jul 28 07:35:47 2022 +0300 Skipped test_pfc_asym_off_rx_pause_frames for Barefoot platform (#6047) commit fcfe9ebe91151bd54da5bc186d64b7a1d2290ece Author: Oleksandr Kozodoi <oleksandrx.kozodoi@intel.com> Date: Thu Jul 28 07:35:15 2022 +0300 Added ignoring expected errors in test_add_rack TC (#6038) What is the motivation for this PR? There are scenarios, where TC applies patch by using config_updater, but updating is still considered as failed: ``` Failed to apply patch Usage: config apply-patch [OPTIONS] PATCH_FILE_PATH Try "config apply-patch -h" for help. Error: After applying patch to config, there are still some parts not updated ``` Steps from generic_patch.py module cover this case, but TC still failed due to errors of sonic_yang in syslog. So was added fixture which provides an approach for ignoring those errors. How did you do it? Added fixture which provides an approach for ignoring expected error messages of syslog in test_add_rack TC. How did you verify/test it? Run test cases. Tests passed. configlet/test_add_rack.py::test_add_rack PASSED [100%] Signed-off-by: Oleksandr Kozodoi <oleksandrx.kozodoi@intel.com> commit 7e7bb7415f6fb8985ad34e2e1df6b442bd0f7750 Author: andywongarista <78833093+andywongarista@users.noreply.github.com> Date: Wed Jul 27 21:34:01 2022 -0700 [platform_tests/api] Fix reading voltage and max_supp_power issue of the psu test (#6013) What is the motivation for this PR? Fix issues in psu test that are causing failures on Arista platforms How did you do it? Fix test_power to check if reading voltage is supported for psu Fix test_power to correctly check against max_supp_power How did you verify/test it? Tested on Arista platform commit 56b57a2f9a1026937ee2f27a654fe3d0b93f98d6 Author: ppikh <70200079+ppikh@users.noreply.github.com> Date: Thu Jul 28 07:32:13 2022 +0300 [VXLAN] Stabilized VNET VXLAN test in case of scale (#6009) Changed logic to do sleep for longer time after applying VNET VXLAN configs, because current logic does not work in case of scale for example with 330000 vnet routes. Now in case of scale we will wait some time until all configuration applied What is the motivation for this PR? When run test tests/vxlan/test_vnet_vxlan.py with scale num routes 33000 test will fail, because not all routes applied and Wr_ARP test also will fail because warm-reboot does not happen (configuration still in progress). Now this issue fixed. How did you do it? Added sleep in case of scale How did you verify/test it? Executed test tests/vxlan/test_vnet_vxlan.py with and without scale Signed-off-by: Petro Pikh <petrop@nvidia.com> commit b60dcddded76780365c89a82abd98907f8943b9a Author: bingwang-ms <66248323+bingwang-ms@users.noreply.github.com> Date: Thu Jul 28 09:23:25 2022 +0800 [testplan] Add testplan for tunnel QoS remapping (#5508) * Add testplan for tunnel QoS remapping Signed-off-by: bingwang <wang.bing@microsoft.com> commit debe6495bacc7022dd099571f8709e7113029616 Author: wenyiz2021 <91497961+wenyiz2021@users.noreply.github.com> Date: Wed Jul 27 18:04:30 2022 -0700 [MASIC] [PR checks] [mgmt] Add sonic-mgmt PR check (#6043) * Add 'image' option to get sonic-vs-4asic.img * Add multi-asic job * run multi-asic-t1-lag-pr for now which only runs test_bgp_fact.py * Fix space * update dutname to vlab-08 * Update azure-pipelines.yml for Azure Pipelines * Revert back deleted jobs * Add back empty line * Update azure-pipelines.yml * Update displayname * Align format * Make the PR check optional for now * Remove spaces * Remove spaces commit 2da3aad03cd74828f635e7b1d9e7999ea5ea91f3 Author: Stephen Sun <5379172+stephenxs@users.noreply.github.com> Date: Thu Jul 28 08:44:05 2022 +0800 [QoS] Verify the additional lossless queues and PGs in QoS test in dual ToR scenario (#5947) * Provide a WA to verify the additional lossless queues and PGs in QoS test in dual ToR scenario 1. Add a CLI option to specify whether it is to verify ports with or without additional lossless PGs and queues. It will collect all the dual ToR ports at the beginning of the test. If the option is on, the additional lossless PGs/queues will be verified in the corresponding tests 2. Update the logic to fetch buffer profiles from BUFFER_QUEUE and BUFFER_PG table according to the CLI 2. Two additional profiles are introduced for verifying the additional lossless PGs/queues in XON and XOFF test if the CLI option is on 3. For headroom pool test, all 4 DSCPs are passed to PTF script as well as the CLI option. The additional DSCPs will be skipped by the PTF docker according to whether the ports are with additional lossless PGs/queues. 4. For DSCP2queue mapping test, the CLI option is passed to PTF script so that the latter is able to check the mapping according to the CLI option. Signed-off-by: Stephen Sun <stephens@nvidia.com> commit 6af5158a3c7df54c939f1273ab03a1f306246ef7 Author: siqbal1986 <shahzad.iqbal@gmail.com> Date: Wed Jul 27 10:41:46 2022 -0700 BFD test update (#5836) Updated BFD test commit c7d19480a23cad967ffc4dee9f799bbccde6e56c Author: mannytaheri <86314901+mannytaheri@users.noreply.github.com> Date: Wed Jul 27 04:46:59 2022 -0400 Added support for returning failure reason if config_system_checks_passed fails (#5975) What is the motivation for this PR? One of the tasks performed by config_system_checks_passed definition is to check if the dut is running. This is done by executing the command "systemctl is-system-running". It returns "False" If the dut is not running but it does not check why the dut is not running. We need to add support for checking the failure reason How did you do it? Execute the command "systemctl is-system-running". Pass if dut is running. Execute the command "systemctl list-units --state=failed" if dut is not running. This will provide the failure reason. Example of failure reason: In this case tacacs-config.timer loaded failed base.py:82 /data/tests/common/devices/multi_asic.py::_run_on_asics#100: [ixre-cpm-chassis10] AnsibleModule::shell Result => {"stderr_lines": [], "cmd": "systemctl list-units --state=failed", "end": "2022-07-09 10:12:18.720054", "_ansible_no_log": false, "stdout": "UNIT LOAD ACTIVE SUB DESCRIPTION\n\u25cf tacacs-config.timer loaded failed failed Delays tacacs apply until SONiC has started\n\n LOAD = Reflects whether the unit definition was properly loaded.\nACTIVE = The high-level unit activation state, i.e. generalization of SUB.\n SUB = The low-level unit activation state, values depend on unit type.\n1 loaded units listed." How did you verify/test it? Tested the code on a dut when it was in the "running" state Tested the code on a dut when it was in the "degraded" state commit d4edaa3262c5bb83be7990de0e0fb9598b842886 Author: Ze Gan <ganze718@gmail.com> Date: Wed Jul 27 16:42:41 2022 +0800 [loganalyzer]Add ignore log for wpa_supplicant (#5887) #### What is the motivation for this PR? We use SIGINT to stop the wpa_supplication in macsecmgmr. but may get the following log: `wpa_supplicant[388]: eloop: could not process SIGINT or SIGTERM in two seconds. Looks like there#012is a bug that ends up in a busy loop that prevents clean shutdown.#012Killing program forcefully.` #### How did you do it? Add this log to ignore list for log analyzer. Signed-off-by: Ze Gan <ganze718@gmail.com> commit 9f45f195209b707891b21534d666109dd58f7be0 Author: ppikh <70200079+ppikh@users.noreply.github.com> Date: Wed Jul 27 11:40:02 2022 +0300 [sensors] Added new mlnx platforms which have different sensors (#6010) What is the motivation for this PR? Mlnx platforms: 3700, 3700c, 4600c could have different sensors, added logic which allow to test 3700, 3700c ,4600c with different sensors How did you do it? Added new sensors data and improved test to support different sensors(platforms) How did you verify/test it? Executed test_sensors.py Any platform specific information? msn3700, msn3700c, msn4600c Signed-off-by: Petro Pikh <petrop@nvidia.com> commit cf8fa2549d5440c62abe46c2a902cf41e01cfd61 Author: Nick Wang <sh_wang@edge-core.com> Date: Wed Jul 27 16:36:51 2022 +0800 Move test_unknown_mac.py to ARP directory because it it not related to (#6018) For test_unknown_mac.py, it is to test unknown MAC (exists in L2 FDB table but not in ARP table), which is not related to PFC feature. Therefore, the test file should be move to proper directory (ARP). commit bfb4a5965bbc766f826a8272b08456501e047751 Author: cgangx <95741698+cgangx@users.noreply.github.com> Date: Wed Jul 27 16:24:37 2022 +0800 Allow LDP traffic on default LDP port (#6019) Add LDP default port to iptables rule set in case LDP is enabled. What is the motivation for this PR? LDP is enabled by default in our case and should be add to iptables rule set. How did you do it? Add iptables rules allowing traffic on LDP default port. How did you verify/test it? Run test on virtual testbed. Co-authored-by: Gang Chen <gach@microsoft.com> commit a6fde46867abd986ef58bd221397ee2e694b22fd Author: Shahzad Iqbal (SHAHZADIQBAL) <SHAHZADIQBAL@ame.gbl> Date: Tue Jul 26 13:55:56 2022 -0700 Updated BFD multihop test to run between non-subnet ip addresses. commit 6319acec81e2cca81bfacec92bb61ebc58b858e5 Author: SuvarnaMeenakshi <50386592+SuvarnaMeenakshi@users.noreply.github.com> Date: Tue Jul 26 12:58:05 2022 -0700 Avoid using asic_index as list index to get the asic (#6045) What is the motivation for this PR? #5828 - Added asics_present field in inventory to provide the list of asics that are present in supervisor. After this change, asic_index cannot be used to retrive asic_instance from duthost.asics list. This fix is done to get the correct asic_instance based on asic index, without this fix test_pretest can fail on supervisor where the asics_present are not consecutive asic_index How did you do it? remove usage of asic_index as list index. How did you verify/test it? test_pretest passes on chassis after this fix. commit f8dff6d503a8b5386822fbecd0215758b46200e2 Author: roman_savchuk <romanx.savchuk@intel.com> Date: Tue Jul 26 13:20:22 2022 +0300 Added option for enable collecting DB data when TC failed (#5403) PR 5197 fixture for collection DB data when TC failed have been introduced. This is good idea, but has one major impact to test suite / regression run. It adds extra regression time to run. It will be proportionally increase execution time if number of failed cases increased. What is the motivation for this PR? Add ability to add option if engineer wants to collect DB data How did you do it? Added "--collect_db_data option which enables collecting DB data if TC failed How did you verify/test it? Run cases with --collect_db_data, if TC failed - DB data collected. Run cases without --collect_db_data, if TC failed - DB not data collected. commit 4df2f37508a292280beed1936b159fb9f2004285 Author: rskorka <80551811+rskorka@users.noreply.github.com> Date: Tue Jul 26 03:08:44 2022 -0700 Added support for Cisco 8000 virtual sonic DUT (#5908) Add playbooks which allow to use T0/T1 virtual testbeds with Cisco 8000e emulator (functional emulator of Cisco 8000 Series routers). How did you do it? Add start/stop playbooks for new DUT type: 8000e. The playbook starts the emulator inside a docker container (for ease of deployment) and then vm-topology module can connect it to the topology. How did you verify/test it? ./testbed-cli.sh -m veos_vtb -n 4 -k vsonic start-vms server_1 password.txt ./testbed-cli.sh -k vsonic -t vtestbed.yaml -m veos_vtb add-topo 8000e-t0 password.txt ./testbed-cli.sh -t vtestbed.yaml -m veos_vtb deploy-mg 8000e-t0 veos_vtb password.txt cd ../tests ./run_tests.sh -n 8000e-t0 -d vlab-8k-01 -c bgp/test_bgp_fact.py -f ../ansible/vtestbed.yaml -i ../ansible/veos_vtb -e --disable_loganalyzer bgp/test_bgp_fact.py::test_bgp_facts[vlab-8k-01-None] PASSED Any platform specific information? While the 8000e emulator supports many Cisco 8000 platforms, the new 8000e playbooks support Cisco-8102-C64 platform specifically. Other 8000e platforms will be tested and enabled in the future. Co-authored-by: Rafal L Skorka <skorka@cisco.com> commit d0052cee1d4f8973683a08b05feab2111af88814 Author: MirceaDan <mircea-dan.gheorghe@keysight.com> Date: Tue Jul 26 03:05:29 2022 -0700 added support for IxNetwork 9.20 Update2 (#6028) updated for spytest the container environment to add support for ixnetwork 9.20u2 What is the motivation for this PR? previous container version was having support for 9.10, but i got requests to add support for 9.20 How did you do it? just a version bump and better versioning for some python packages to avoid conflicts How did you verify/test it? ran a test via the framework Co-authored-by: MirceaDan <ByReaL@users.noreply.github.com> commit f1141557e47d3d58e8b300d287520075b967acc2 Author: vperumal <vperumal@gmail.com> Date: Tue Jul 26 03:04:01 2022 -0700 Skipping test for absent psu's (#5679) What is the motivation for this PR? Currently the tests are trying to access PSU which are not present and failing for them. Even though there is a skip_psu_list, there is no code present to use it. Added the support for it in all the relevant cases. How did you do it? How did you verify/test it? Verified against cisco-8000 platform Co-authored-by: Perumal Venkatesh <pevenkat@cisco.com> commit 4033ab94a0275066b9ec7e5fbcaab03628088913 Author: oleksandrKovtunenko <104843237+oleksandrKovtunenko@users.noreply.github.com> Date: Tue Jul 26 13:02:40 2022 +0300 added skip LAG ports in case T1 TOPO Fix for issue https://github.com/Azure/sonic-mgmt/issues/5578 (#5704) Fix for issues/5578 skip LAG ports in case t1 topology usage What is the motivation for this PR? fix for #5578 exclude portchanel ports in T1 topology How did you do it? skip PortChannel interfaces on T1 topology How did you verify/test it? run test_qos_sai.py tests on t1-lag topology and check that portchanel ports are excluded Co-authored-by: alexander Kovtunenko <alexander198961@gmail.com> commit f5bcc52a769aee224539b73637a1b054a1b48ce2 Author: roman_savchuk <romanx.savchuk@intel.com> Date: Tue Jul 26 13:01:26 2022 +0300 [cpu_mem_usage] update TC with wait for all critical services is fully started (#5754) During regression cpu_usage TC run's after TC that does config reload at the end. Due to this syncd CPU usage is higher than TC expects as not all services come up and syncd actively works at period after system reload (reboot). What is the motivation for this PR? Make TC persistent. Avoid failures after DUT reboot (reload) How did you do it? Add wait_until method for check if all critical services come up and than run TC How did you verify/test it? Run cpu_mem_usage test case when services up (normal DUT state) and when services has been restarted. TC passed. Any platform specific information? NOTE: if TC run when all services started it takes app 180 sec to pass TC. after reboot or config reload TC run takes 180 sec + time for critical services to be started (200 sec) commit dba8c50864fed6c28fdc2b9417a2c09bf72de38f Author: Nana@Nvidia <78413612+nhe-NV@users.noreply.github.com> Date: Tue Jul 26 09:32:06 2022 +0800 Fix the json dumps failure issue in sonic.py (#6033) What is the motivation for this PR? When the value of facts["asics_present"] is range(0,1), then the json.dumps(facts) will throw exception: TypeError: Object of type 'range' is not JSON serializable, need to change the range(0,1) to list. This PR is to fix the json.dumps(facts) failure issue How did you do it? Change facts["asics_present"] = asics_present if len(asics_present) != 0 else range(facts["num_asic"]) to facts["asics_present"] = asics_present if len(asics_present) != 0 else list(range(facts["num_asic"])) How did you verify/test it? Run the test case, and the json.dumps can pass commit 8e0d21d524c939829c73c830219b78e1ff015e16 Author: bingwang-ms <66248323+bingwang-ms@users.noreply.github.com> Date: Tue Jul 26 08:11:49 2022 +0800 [EVERFLOW][hotfix] Skip EGRESS MIRRORING test on Broadcom platform (#6026) * Skip EGRESS MIRRORING test on Broadcom platform * Skip dnx commit 2a801bc7fef29cd3e2c6fa714f6508b9cad9b0fe Author: Zhaohui Sun <94606222+ZhaohuiS@users.noreply.github.com> Date: Tue Jul 26 07:55:21 2022 +0800 Fix test case issue for dualtor in test_lag_2 (#6025) What is the motivation for this PR? test_lag_db_status failed on dualtor testbed. Two issues: It added pytest.fail wrongly. dut_name, dut_lag = decode_dut_port_name(enum_dut_portchannel_with_completeness_level) It will return one dut_name in one case, but we don't which dut is chosen for dualtor. That's why we loop duthosts to find the correct one and run test. If it loops for another duthost, we should skip it not fail the test case. Add one common function get_duthost_with_name to return duthost, reduce two indents In recover phase, it uses test_lags wrongly for dualtor. Since if test case failed in step 1, test_lags is not defined. We should loop all duthosts to check if interface status is down, if so, recover it by noshutdown. Remove duthost for loop for test_lag_db_status_with_po_update, because this case if for t1-lag only. How did you do it? Remove pytest.fail in test case test_lag_db_status and test_lag_db_status_with_po_update. Enhance recover steps. How did you verify/test it? Run pc/test_lags_2.py::test_lag_db_status. Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com> commit 45b20e67f0aa9541a277c53928c55cda82177f6a Author: Nana@Nvidia <78413612+nhe-NV@users.noreply.github.com> Date: Tue Jul 26 04:40:05 2022 +0800 [Qos]Fix the testQosSaiQSharedWatermark test failure due to SAI queue watermark is not reset (#6001) configuring counterpoll watermark enable doesn't suffice to enable the queue watermark polling if the counter polling was disabled. As as result, the queue watermarks in SAI will not be clear successfully, which will fail the queue watermark test.To avoid that, counter poll should be enabled for both queue and watermark.with this command, the SAI Queue watermark can be cleared. How did you do it? Add "counterpoll queue enable", "counterpoll queue disable" in the resetWatermark fixture How did you verify/test it? Run the testQosSaiQSharedWatermark , test will not fail due to the SAI queue watermark is not cleared commit 0fefd1318014673a351d4afb9cd152cc2f3cb7ef Author: Ye Jianquan <jianquanye@microsoft.com> Date: Mon Jul 25 13:14:24 2022 -0700 [RDMA&SNAPPI] Skip snappi warm-reboot testcases on TD2 platform (#6030) How did you do it? Compare the skipped testcases of tgen and snappi, and skip the ones that need to be skipped in snappi. commit 63c03759f44afe2013c600ae8a7248397f2dea8c Author: Andrii-Yosafat Lozovyi <andrii-yosafatx.lozovyi@intel.com> Date: Mon Jul 25 13:03:04 2022 +0300 [auto-ts] Fix and stabilize auto-techsupport tests (#5986) Summary: Made changes that fixes some issues with auto-techsupport and stabilizes TC Main changes that was made: 1.) Changed --since option from '300 sec': 360 to '300 sec ago': 360 This is made because TC will take actual time and add 300 seconds to it, and will collect logs and core dump only starting from that time according to - Date-input-formats 2.) Check available_tech_support_files before core dump generation is triggered in test_rate_limit_interval Test might fail because when available_tech_support_files is checked after core dump was generated, techsupport dump file is already generated, and TC fails in further steps because expects new_techdump to be generated. Signed-off-by: Andrii-Yosafat Lozovyi <andrii-yosafatx.lozovyi@intel.com> commit 7157274f04c986501141774fb1b53dc9d42b81d2 Author: Xin Wang <xiwang5@microsoft.com> Date: Mon Jul 25 15:00:01 2022 +0800 Improve robustness of remove-topo operation (#6020) What is the motivation for this PR? The testbed-cli.sh remove-topo operation is to remove the topology, including cEOS neighbors, ovs bridge bindings, and ovs bridges. For some reason, the ovs bridges to be removed may be gone. In this case, the testbed-cli.sh remove-topo will fail with not able to find the bridges. Indeed, this failure is unnecessary because this operation is to remove the bridge bindings and bridges. If a bridge is already gone, then we can simply ignore it. How did you do it? This change improved the vm_topology ansible module to skip a bridge if it does not exit while trying to remove the bridge bindings and bridges. With this change, the testbed-cli.sh remove-topo always can succeed and do the job. This PR also improved the error message if run a command failed. How did you verify/test it? Tested on physical and virtual testbed. Tried remove-topo, add-topo and restart-ptf with good and broken test topology. Signed-off-by: Xin Wang <xiwang5@microsoft.com> commit 73a2e5a81045a5d2924f5fdc8c16d037ea0a704d Author: Longxiang Lyu <35479537+lolyu@users.noreply.github.com> Date: Mon Jul 25 14:45:04 2022 +0800 [dualtor][active-active] Unblock fib tests (#6006) Approach What is the motivation for this PR? Enable test_fib on dualtor-mixed testbeds. Signed-off-by: Longxiang Lyu lolv@microsoft.com How did you do it? 1. Add fixture mux_status_from_nic_simulator to interacts with nic_simulator to retrieve mux status for ports in active-active cable type. 2. Add fixture ptf_test_port_map_active_active to build ptf port map that supports dualtor-mixed testbed. The key difference is that, for packets ingressing ptf ports connected to DUTs' active-active cable type ports, the target DUT could be either upper ToR or lower ToR(both ToRs are active), the generated ptf port mapping will be like: ``` u'3': {u'asic_idx': 0, u'target_dest_mac': u'00:aa:bb:cc:dd:ee', u'target_dut': [0, 1], u'target_src_mac': [u'2c:dd:e9:0f:4e:50', u'2c:dd:e9:0f:3d:4c']} ``` 3. Enable ptftests hash_test and fib_test to do I/O verification for this multi-next-hop scenario. For a packet sent from ptf port that connects to an active-active port, the test will try to use the fibs from each ToR to parse its nexthops, and verify the packet forwarding on those nexthops. For ECMP behavior, the test will only check balancing over a single ToR. For example, if a packet destinated to 1.1.1.1 is sent to ptf port eth3, which is connected Ethernet12 of both ToRs, the test will determine the nexthops on both ToRs, if both ToRs forward this packet with the default route, and upper ToR will forward this packet to ptf ports [30, 32, 34, 36], and lower ToR will forward this packet to ptf ports [31, 33, 35, 37], the test will try to verify the packet forwarding on ports [30, 31, 32, 33, 34, 35, 36, 37]. And for balancing, the test will verify the traffic is balanced over a single ToR, so it will try to verify balancing on ports set [30, 32, 34, 36] or [31, 33, 35, 37] separately. How did you verify/test it? Run test_fib on dualtor, dualtor-mixed, t0, and t1 Any platform specific information? Supported testbed topology if it's a new test case? commit e027d9b449d78ac4702e8d8fbb840302eae516e3 Author: Shahzad Iqbal (SHAHZADIQBAL) <SHAHZADIQBAL@ame.gbl> Date: Sat Jul 23 16:45:36 2022 -0700 Added support ofr multi-hop bfd testing. commit 857e83ef07a48af973fa458b268f6db67b0f1454 Author: Longxiang Lyu <35479537+lolyu@users.noreply.github.com> Date: Sat Jul 23 11:01:54 2022 +0800 [dualtor] Leave `icmp_responder` running on `dualtor-mixed` testbeds (#5972) Approach What is the motivation for this PR? On dualtor-mixed testbeds, leave icmp_responder running to avoid introducing unnecessary toggles. Signed-off-by: Longxiang Lyu lolv@microsoft.com How did you do it? Comment out the teardown to stop icmp_responder How did you verify/test it? commit ca4d0fa69063a1a2b3a119d4103304f35d7c0634 Author: Xin Wang <xiwang5@microsoft.com> Date: Fri Jul 22 13:56:13 2022 +0800 Fix filename issue and collect important DB dumps (#6007) What is the motivation for this PR? The autoused fixture collect_db_dump has some issues: * If a test case name has some special characters, the fixture may fail. * It automatically collects dump of all databases for each failed test case. It's unnecessary to collect all the DB dumps. How did you do it? * The collected dumps are stored to a folder named with the test case name. However, test case name may have characters that can't be used in file name. This change added a utility function to remove characters illegal in filename to convert a string to a safe filename. * Not all the collected database dumps are useful for later troubleshooting. This change improved the fixture to only collect dumps of some important databases. * This change also deleted the fetch_dbs function which is not used anywhere. Signed-off-by: Xin Wang <xiwang5@microsoft.com> commit cfc550a27ace8043dd171c113b50a8627c03ecda Author: Shahzad Iqbal <shahzadiqbal@microsoft.com> Date: Thu Jul 21 16:38:29 2022 -0700 LGTM Alrts fixed. removed unused function. commit e33b2284dbe6ae74234c901602d1e1ffaceaf5dd Author: ppikh <70200079+ppikh@users.noreply.github.com> Date: Thu Jul 21 19:41:50 2022 +0300 [auto_techsupport] alligned auto_tech_support tests with latest CLI on SONiC master image (#5954) [auto_techsupport] alligned auto_tech_support tests with latest CLI on SONiC master image CLI has been changed - as result tests started to fail Now CLI parser fixed by implementing multi-branch support and tests should pass Signed-off-by: Petro Pikh <petrop@nvidia.com> commit ffcdd14920e760bc2adeeb9ea6e031075f0c4b55 Author: StormLiangMS <89824293+StormLiangMS@users.noreply.github.com> Date: Thu Jul 21 01:54:29 2022 -0700 [vlan/test_vlan_ping] skip test on broadcom platform #6012 What is the motivation for this PR? To skip test test_vlan_ping which doesn't work for broadcom platform. How did you do it? Add test to tests/common/plugins/conditional_mark/tests_mark_conditions.yaml to skip it. How did you verify/test it? Any platform specific information? broadcom platform Supported testbed topology if it's a new test case? commit b1866a1ba00b2b338356e5999c09c549cb7c64a3 Author: Longxiang Lyu <35479537+lolyu@users.noreply.github.com> Date: Thu Jul 21 11:28:16 2022 +0800 [nic_simulator] Add timeout and common options (#6005) Approach What is the motivation for this PR? mgmt client could not talks to the gRPC server. Signed-off-by: Longxiang Lyu <lolv@microsoft.com> How did you do it? Bring up loopback device to enable mgmt service talks to each gRPC server interacting with SONiC. Add common gRPC options to the nic_simulator. Add timeout for the gRPC calls from mgmt service. How did you verify/test it? Verify mgmt client could talks to the mgmt service of `nic_simulator. Any platform specific information? commit 9acb53139a151a7ade913335b72a4cfa778781d5 Author: Zhaohui Sun <94606222+ZhaohuiS@users.noreply.github.com> Date: Thu Jul 21 10:40:50 2022 +0800 Update require option to False for os_version in test_reporter.py to avoid uploading failure in nightly test (#6015) What is the motivation for this PR? In https://github.com/Azure/sonic-mgmt/pull/5992, wrongly add os_version parameter as required True. It should be false, because if with required True, test_reporter.py will fail to run for current nightly test. It impacts nightly test, should make it robust with current nightly test yaml file. How did you do it? Change it to required False. How did you verify/test it? Run nightly test with pipeline. Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com> commit 656624555e2fa5f99a6ceef740b05d96bd5790ff Author: Zhaohui Sun <94606222+ZhaohuiS@users.noreply.github.com> Date: Thu Jul 21 08:47:30 2022 +0800 Enhance test report to include pipeline results (#5992) What is the motivation for this PR? Currently, when nightly test pipeline fails before running test, test report upload will fail too, because there is no XML file and it will throw error out. We don't know if pipeline does not run on that day, or it fails. How did you do it? Enhance test report upload scripts to record pipeline status and upload it to kusto. Add a new collect_azp_results.py to collect task status for specific pipeline. If there is no XML file, upload summary table with 0 values. Create a new table TestReportPipeline, record testbed name, os version, success tasks, failed tasks and cancelled tasks and upload the record to kusto. How did you verify/test it? Run nightly test and check kusto. Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com> commit cbab17e5b512a9f5028dfde05ff7995df9028d1f Author: ppikh <70200079+ppikh@users.noreply.github.com> Date: Thu Jul 21 03:30:47 2022 +0300 [conditional_mark] Improved conditional_mark plugin to support "OR" or "AND" condition between condition in conditions list (#6008) Improved conditional_mark plugin to support "OR" or "AND" operand between condition in conditions list Previously every time we did AND operand between condition in conditions list, now we can provide "conditions_logical_operator" argument with operation which should be performed between conditions. Possible arguments (by default, if not provided - AND used): ``` conditions_logical_operator: or conditions_logical_operator: and ``` Example of usage (test will be ignored if first or second condition in list True): ``` ecmp/test_fgnhg.py: skip: reason: "Testcase ignored - check ignore condition in ignore file" conditions_logical_operator: or conditions: - "https://redmine.x.com/issues/12345 and 'msn2' in platform" - https://redmine.x.com/issues/54321 ``` Signed-off-by: Petro Pikh <petrop@nvidia.com> commit f526c6f6de1de6e100bf59240155b1d50a824e9d Author: Richard.Yu <richard.yu@microsoft.com> Date: Thu Jul 21 08:00:42 2022 +0800 add fdb test cases (#6004) * add fdb test cases Signed-off-by: richardyu-ms <richard.yu@microsoft.com> * Update cases_brcm_t0.py commit 2c34f877cf1d537d81fdc1b40a7e8926288b332f Author: Jing Zhang <zhangjing@microsoft.com> Date: Wed Jul 20 14:42:01 2022 -0700 [LogAnalyzer] white-list Could not get port instance error message (#5995) Summary: Fixes # (issue) This PR is to temporarily white-list the error log below: Jul 14 18:56:30.247123 svcstr-7050-acs-2 ERR pmon#ycable[30]: Error: Could not get port instance for muxcable info for Y cable port Ethernet24 There should be an image fix soon for this. sign-off: Jing Zhang zhangjing@microsoft.com commit 85ce59619567317fbd6869c04937ca9ba33ff5e2 Author: mannytaheri <86314901+mannytaheri@users.noreply.github.com> Date: Wed Jul 20 15:38:34 2022 -0400 enum_asic fixtures to use only asics that are present instead of all asics based on num_asics (#5828) What is the motivation for this PR? Tests that are parameterized using the enum_asic fixtures fail when the selected asic index is not present (operational) in the DUT. Example of this is on the supervisor card in a chassis, where we could have not all SFMs in the chassis. In this case, only asics corresponding to the SFMs present in the chassis are operational, but not the others. For example, on a Nokia chassis, if w…

* Squashed commit of the following: commit 0b6042544e8dcccdcd79a25c7748fd11b9bc27ad Author: siqbal1486 <shahzad.iqbal@microsoft.com> Date: Wed Aug 10 15:26:43 2022 -0700 changed suggested in review. cleanup commit 0cc1d72b7e0c5da97815fc0a69d12d2a0c2171a9 Merge: f6f02f03 6850440d Author: siqbal1486 <shahzad.iqbal@microsoft.com> Date: Wed Aug 10 14:31:28 2022 -0700 Merge branch 'bfd_test_multihop' of https://github.com/siqbal1986/sonic-mgmt into bfd_test_multihop commit 6850440d5f90a1e2ae0d78c2f2f42f9fc39b3c95 Merge: 5924f75c 93323578 Author: siqbal1986 <shahzad.iqbal@gmail.com> Date: Wed Aug 10 14:30:19 2022 -0700 Merge branch 'master' into bfd_test_multihop commit f6f02f036b767dc5012b55bee92f4f3944470083 Merge: f78da62f 5924f75c Author: siqbal1486 <shahzad.iqbal@microsoft.com> Date: Wed Aug 10 13:06:42 2022 -0700 Merge branch 'bfd_test_multihop' of https://github.com/siqbal1986/sonic-mgmt into bfd_test_multihop commit 9332357850282dd61dba5bbfbe68463dd088e91d Author: Jibin Bao <jbao@nvidia.com> Date: Thu Aug 11 00:09:18 2022 +0800 Add test plan for syslog source ip feature (#5943) commit fb51ba2b092ea48d9233e9f3efcc1811afef2668 Author: Nana@Nvidia <78413612+nhe-NV@users.noreply.github.com> Date: Thu Aug 11 00:06:43 2022 +0800 [Qos]TestQosSai should not be skipped on ptf32, ptf64 topo (#6112) - What is the motivation for this PR? For mellanox asic, TestQosSai support to run on ptf32,ptf64 topos, it should be skip on these topos - How did you do it? Add support for ptf32,ptf64 in tests/common/plugins/conditional_mark/tests_mark_conditions.yaml - How did you verify/test it? Run the TestQosSai on ptf topo, and it is not skipped. Change-Id: I6d37aca287e8e797ae43de903920fb61c2e1ae9c commit 22ac478a87f1c81643e6733cb3090e8d5f696d9e Author: Ashwin Srinivasan <93744978+assrinivasan@users.noreply.github.com> Date: Wed Aug 10 08:56:03 2022 -0700 Removed the superfluous pdb trace command from the get_healthy_psu_num function in test_platform_info (#6135) commit 22fb68f8ade261eabaf323ba85ec63028d324d75 Author: Cong Hou <97947969+congh-nvidia@users.noreply.github.com> Date: Wed Aug 10 23:16:45 2022 +0800 [sub-interface] use OrderedDict instead of built-in dict for ptf and dut ports in get_port() function of sub-interface test (#6125) The function get_port() in tests/sub_port_interfaces/sub_ports_helpers.py is using built-in dictionary to store the dut ports and ptf ports selected for the subinterface test. However, because there's no order in the built-in dict, sometimes the dut port could be paired with a wrong ptf port, which will cause the test to fail. In the function get_ports() the dut ports is returned in dict and the ptf ports is returned in list of the dict values, and they are zipped in the caller to do iteration. It is not guaranteed that when zipping, the dut port is paired with the correct ptf port. For example in tests/sub_port_interfaces/conftest.py So need to use OrderedDict instead of built-in dictitonary to store the selected dut ports and ptf port in get_ports(). commit f3748cfef4bca1604037ab116586c7b33a2c8b81 Author: Yutong Zhang <90831468+yutongzhang-microsoft@users.noreply.github.com> Date: Wed Aug 10 20:49:59 2022 +0800 [bugfix] skip vlan/test_vlan_ping.py (#6137) Description of PR In pr #5708 , we skip the test cases in tests/common/plugins/conditional_mark/tests_mark_conditions.yaml. There is a merge conflict and forget to skip vlan/test_vlan_ping.py when the asic_type is broadcom. In this pr, skip this module. What is the motivation for this PR? In pr #5708 , we skip the test cases in tests/common/plugins/conditional_mark/tests_mark_conditions.yaml. There is a merge conflict and forget to skip vlan/test_vlan_ping.py when the asic_type is broadcom. In this pr, skip this module. How did you do it? Add the condition to skip vlan/test_vlan_ping.py. Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com> commit b1f80d1cc63b092f280e6ed3d6e80da251b1fbed Author: Kostiantyn Yarovyi <kostiantynx.yarovyi@intel.com> Date: Wed Aug 10 13:03:29 2022 +0200 add sleep after remove vrf (#6133) What is the motivation for this PR? vrf does not have enough time to remove before a creation. Therefore a test TestVrfDeletion::test_vrf1_neigh_after_restore failed How did you do it? add sleep How did you verify/test it? run vrf/test_vrf.py::TestVrfDeletion::test_vrf1_neigh_after_restore commit 5b9a30c112c08b611c9212efdc30262c31ce7cd1 Author: Yutong Zhang <90831468+yutongzhang-microsoft@users.noreply.github.com> Date: Wed Aug 10 16:33:16 2022 +0800 Restore tacacs_server after the module tacacs/test_accounting.py running. (#6117) Description of PR In module tacacs/test_accounting.py, the fixture check_tacacs use the function setup_tacacs_client to delete the default tacacs server, and set the ptf mgmt ip as tacacs sever ip. But it doesn't restore this config when the module finish running. We want to keep the config in consistent before and after the testcase running, so fix it. What is the motivation for this PR? In module tacacs/test_accounting.py, the fixture check_tacacs use the function setup_tacacs_client to delete the default tacacs server, and set the ptf mgmt ip as tacacs sever ip. But it doesn't restore this config when the module finish running. We want to keep the config in consistent before and after the testcase running, so fix it. How did you do it? Get the default tacacs server and put them into a list, when the module finish running, delete the ptf mgmt ipand restore the default tacacs server ip. How did you verify/test it? Running the test cases in this module and compare the tacacs server ip before and after running. Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com> commit a60e5a6e1d32ea7fa104046d36aae8d1ba707dd9 Author: Zhaohui Sun <94606222+ZhaohuiS@users.noreply.github.com> Date: Wed Aug 10 15:02:15 2022 +0800 Add StartTimestamp column in TestReportPipeline table (#6132) What is the motivation for this PR? Add StartTimestamp column in TestReportPipeline table How did you do it? Use another API to get the start time of pipeline and upload it to Kusto. How did you verify/test it? python3 collect_azp_results.py 8888 python3 report_uploader.py -c "test_result" -e "vms-t0-kvm.201911#132728" -t "vms-t0-kvm" -i "http://****/sonic-broadcom.bin" results SonicTestData Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com> commit 5514acb1a07a61284ce481d788274ad5d2a9ac18 Author: Ihor Chekh <ichekh@nvidia.com> Date: Wed Aug 10 00:42:20 2022 +0300 BFD test fixes and improvements (#6082) *Single hop BFD test fixes and improvements commit f78da62f1afc481fd5b38dc716fce18163a88625 Merge: 3ea15a68 c309ff26 Author: siqbal1986 <shahzad.iqbal@gmail.com> Date: Tue Aug 9 13:01:16 2022 -0700 Merge branch 'master' into bfd_test_multihop commit 5924f75c2db3dbad70c6373f989fdb20b74345d4 Merge: 3ea15a68 c309ff26 Author: siqbal1986 <shahzad.iqbal@gmail.com> Date: Tue Aug 9 13:01:16 2022 -0700 Merge branch 'master' into bfd_test_multihop commit c309ff26b1a4dd1783782f68aeffe464ca68f463 Author: Ye Jianquan <jianquanye@microsoft.com> Date: Tue Aug 9 17:25:36 2022 +0800 [TestbedV2]Convert t1-lag pr test to TestbedV2 (#6127) Convert t1-lag pr test to TestbedV2 Approach What is the motivation for this PR? Convert the t1-lag pr test to TestbedV2, to reduce test time by distributing test cases on multi-instances. Currently, the preparation of the testbed(add-topo, deploy-mg) is operated implicitly, before the testbed is ready, the progress of the test plan keeps 0. We will refine the progress indicator in a future release. The conversion can be dynamically reverted by modifying an AZP library variable: Testbed-Tools/RUN_TEST_BY_SCHEDULER : YES/NO How did you do it? Modify the pipeline yaml file. After converting to TestbedV2, the AZP only create the test plan and poll the result of the test plan. How did you verify/test it? The pass result of this pr is the test result of this pr. Signed-off-by: Jianquan Ye<jianquanye@microsoft.com> commit c3f124f34a37e1fca93aab2e912b418cbb084841 Author: ppikh <70200079+ppikh@users.noreply.github.com> Date: Tue Aug 9 06:20:48 2022 +0300 Fixed dut_basic_facts ansible module to have support SONiC images which does not have attribute "is_supervisor" (#6118) Description of PR Fixed dut_basic_facts ansible module to have support SONiC images which does not have attribute "is_supervisor" Previoulsy when we call dut_basic_facts ansible module on SONiC image which does not have attribute "is_supervisor"(for example: 202012) we received error: AttributeError("'module' object has no attribute 'is_supervisor'" Now issue fixed - script will work on all SONiC branches Issue introduced in PR: #5708 Summary: Fixed dut_basic_facts ansible module to have support SONiC images which does not have attribute "is_supervisor" What is the motivation for this PR? Fix AttributeError("'module' object has no attribute 'is_supervisor'" How did you do it? See code How did you verify/test it? Executed ansible module: dut_basic_facts Signed-off-by: Petro Pikh <petrop@nvidia.com> commit 5ee2f0cd3237114bd4bc0b0cc910dac2788f8123 Author: Ze Gan <ganze718@gmail.com> Date: Tue Aug 9 10:11:39 2022 +0800 Revert "[kvmtest.sh]: Ignore test_t0_sonic temporarily (#6104)" (#6119) This reverts commit d3bc674964cf4244994bb204b37e0c19e140ca10. commit f46f36171819265a3514c48509e8e8b685593ae2 Author: Yutong Zhang <90831468+yutongzhang-microsoft@users.noreply.github.com> Date: Tue Aug 9 07:57:36 2022 +0800 [bugfix] Fix an error in tests_mark_conditions.yaml (#6113) Description of PR There is a condition error in tests_mark_conditions.yaml, fix it. What is the motivation for this PR? There is a condition error in tests_mark_conditions.yaml, fix it. Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com> commit bf02aeae1d6c8e7d4523e21ec5fdf8e04688b689 Author: Lawrence Lee <lawlee@microsoft.com> Date: Mon Aug 8 14:51:06 2022 -0700 [dualtor]: Resolve neighbor after neighbor removal (#6071) - After restarting arp_responder during the test, also restart arp_update process on the DUT to resolve failed neighbor entries - Improve test case cleanup Signed-off-by: Lawrence Lee <lawlee@microsoft.com> commit 41d7b15524017f9a267ff010846d5a1f9681b307 Author: Nana@Nvidia <78413612+nhe-NV@users.noreply.github.com> Date: Mon Aug 8 20:34:45 2022 +0800 Add rif loopback action test plan (#5956) Add test plan for the RIF interface loopback action feature. The HLD for the RIF interface loopback action: https://github.com/sonic-net/SONiC/blob/master/doc/ip-interface/loopback-action/ip-interface-loopback-action-design.md commit 24dad8f4b7036376b873c4a0a71a5b7d8a649be8 Author: Yutong Zhang <90831468+yutongzhang-microsoft@users.noreply.github.com> Date: Mon Aug 8 11:01:03 2022 +0800 Remove "BGP_BBR" in config after test case test_bbr_disabled_dut_asn_in_aspath running. (#6102) Description of PR During the test case test_bbr_disabled_dut_asn_in_aspath running, it changes the status of "BGP_BBR" in config db. But before running, there is no "BGP_BBR" in config db. This cause the inconsistent in config db before and after the test case running. In this pr, we delete the key "BGP_BBR" in config db after the test case running. What is the motivation for this PR? During the test case test_bbr_disabled_dut_asn_in_aspath running, it changes the status of "BGP_BBR" in config db. But before running, there is no "BGP_BBR" in config db. This cause the inconsistent in config db before and after the test case running. In this pr, we delete the key "BGP_BBR" in config db after the test case running. How did you do it? Use configlet to delete the config after test case running. How did you verify/test it? Check the config db before and after the test case running. Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com> commit e011ed0ac2ea7e062bf4f6d55177ecfd4e907569 Author: Jing Zhang <zhangjing@microsoft.com> Date: Sun Aug 7 18:42:01 2022 -0700 Enable `test_normal_op` test cases on active-active dualtor interfaces (#5984) Approach What is the motivation for this PR? To enable dualtor io tests on active-active dualtor ports. How did you do it? 1. Added control utilities in nic_simulator_control, for toggling active-active interfaces to standby/active states on any or both duthosts. Toggles is triggered through cmd line, which is different to active-standby ports. 2. Added active-active type in cable_type fixture. 3. Update test_normal_op cases to adapt active-active interfaces. For some cases, disruption is not expected any more. 4. Adjust test names, commets, to better suit today's usage. How did you verify/test it? Run test cases on mixed topology. commit d3bc674964cf4244994bb204b37e0c19e140ca10 Author: Ze Gan <ganze718@gmail.com> Date: Sun Aug 7 18:41:38 2022 +0800 [kvmtest.sh]: Ignore test_t0_sonic temporarily (#6104) What is the motivation for this PR? There is a bug in vsonic as neighbor devices, ignore t0_sonic temporarily and added it back if the bug is fixed. How did you do it? add || ture in ./run_tests.sh to ignore the test result. Signed-off-by: Ze Gan <ganze718@gmail.com> commit 35a1f1e5b0fc788648dab9405ec2a478732ed99e Author: Xin Wang <xiwang5@microsoft.com> Date: Sat Aug 6 15:40:18 2022 +0800 Fix cEOS duplicated mac address issue on Ubuntu 22.04 (#6090) What is the motivation for this PR? If deploy a topology using cEOS, one of the steps is to create veth interfaces for the cEOS docker containers. For example, the current steps to create backplane interfaces: 1.1 Create veth pair in host for container VM0100 ip link add VM0100-back type veth peer name eth5 1.2 Add the eth5 interface to network namespace of container VM0100 2.1 Create veth pair in host for another container VM0101 ip link add VM0101-back type veth peer name eth5 2.2 Add the eth5 interface to network namespace of container VM0101 As we can see that after step 1.2, eth5 is no longer in the host namespace. Then in step 2.1 we can add another interface with same name eth5. The problem is that on Ubuntu 22.04, mac address of eth5 created in step 2.1 will be the same as the eth5 interface created in step 1.1. Possibly Ubuntu 22.04 is using a different algorithm for assigning mac address to new veth interfaces. If interface name is same, then mac address will be same too. Because all the VMxxxx-back interfaces will be attached to a same ovs bridge, their peer interfaces should not use same mac address. How did you do it? The fix is to create veth interfaces with unique name in host for all cEOS containers in the beginning. Then all the interfaces in different cEOS have unique mac address. How did you verify/test it? Tested using 'testbed-cli.sh remove-topo' and 'testbed-cli.sh add-topo' Signed-off-by: Xin Wang <xiwang5@microsoft.com> commit a3268ac644162c6feff44163f023a0df41ad337a Author: jingwenxie <jingwenxie@microsoft.com> Date: Sat Aug 6 05:26:55 2022 +0800 [tests/configlet] Remove ignore path in addrack test (#6088) Summary: Remove the ignore path that were blocked by YANG before. ### Approach #### What is the motivation for this PR? The ignore_path should be removed in apply-patch operation. #### How did you do it? Remove ignore_path. commit 180641d9fbcbe21af61b6c75364e7d799454649e Author: Vaibhav Hemant Dixit <vaibhav.dixit@microsoft.com> Date: Fri Aug 5 11:31:08 2022 -0700 [decap] Bug fix: add missing import for util function in test_decap (#6072) Bug fix: add missing import for util function in test_decap test_decap is failing on master branch due to a bug introduced by #5834 The changes were tested on 202012, but not on master where the json import did not exist. commit 8ac482562757308f2bc24608d423e3fddc477c06 Author: ShiyanWangMS <shiyanwang@microsoft.com> Date: Thu Aug 4 21:58:59 2022 -0700 Improve debug capability for testcase [test_ecn_during_decap_on_active] (#6091) What is the motivation for this PR? The testcase(test_ecn_during_decap_on_active) results are not stable. Sometime it will fail due to not receiving expected packets. And there is no useful debug information in log file. How did you do it? Add "portstat -c" before sending packets and add "portstat -j" after sending packets. Add "show arp" to quickly identify which is the RX/TX port. How did you verify/test it? Manually run the testcase without Python error. commit 4cac24854714eb8c52a682311bddbb454f3874ee Author: Ashwin Srinivasan <93744978+assrinivasan@users.noreply.github.com> Date: Thu Aug 4 09:59:14 2022 -0700 Adds a function to get the number of healthy PSUs in a device (#6060) commit 98752da2c0410dd31696d400651c516facce62c2 Author: Yutong Zhang <90831468+yutongzhang-microsoft@users.noreply.github.com> Date: Thu Aug 4 14:21:59 2022 +0800 Replace `pytest.skip` in test scripts with conditional marks. (#5708) Description of PR When we use pytest.skip in test scripts, it will first execute some fixtures in test cases, which will waste some time. When using conditional marks to skip test cases, it will skip the case in the collect period, which will not execute fixtures in test cases and save some execute time. In this pr, we replace pytest.skip in test scripts with conditional marks to skip test case in advance and save execute time. What is the motivation for this PR? When we use pytest.skip in test scripts, it will first execute some fixtures in test cases, which will waste some time. When using conditional marks to skip test cases, it will skip the case in the collect period, which will not execute fixtures in test cases and save some running time. In this pr, we replace pytest.skip in test scripts with conditional marks to skip test case in advance and save running time. How did you do it? Replace pytest.skip in test scripts with conditional marks in tests_mark_conditions.yaml. How did you verify/test it? By running whole test cases and observe the running time. Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com> commit 5eb6d94ad0ce3cb25e82b4f47f21044ddc83ce41 Author: Stephen Sun <5379172+stephenxs@users.noreply.github.com> Date: Thu Aug 4 10:51:01 2022 +0800 Fix issue: there should be one DSCP mapped to queue 2/6 in non dual-ToR scenarios (#6089) Signed-off-by: Stephen Sun <stephens@nvidia.com> commit 709d503a8f73ec6d31d9173d7c7701ae0f520ed9 Author: Sudharsan Dhamal Gopalarathnam <dgsudharsan@users.noreply.github.com> Date: Wed Aug 3 09:57:34 2022 -0700 [kvm]Avoid running ebtables test in KVM (#6073) *Avoid running ebtables test in KVM sonic-net/sonic-buildimage#11585 ebtables shouldn't be installed in KVM which blocks L2 forwarding. So removed the logic to install ebtables rules in SONiC. Hence removing the ebtables tests to be executed on KVM. commit c8ff3f0d69f558fee6b48c0d6510b5e7693ec8c2 Author: rraghav-cisco <58446052+rraghav-cisco@users.noreply.github.com> Date: Wed Aug 3 09:07:51 2022 -0700 Adding cisco-8000 to the list of platforms for forward action. (#6068) For cisco-8000 platforms, set forward action on Rx in presence of pfc-wd Change is made after: #5665 commit 96a460309c8c98038d075941b6f050cfb708c025 Author: Neetha John <nejo@microsoft.com> Date: Wed Aug 3 09:06:56 2022 -0700 [swap_syncd] Avoid bgp idle check since bgp docker is already down (#6083) Signed-off-by: Neetha John <nejo@microsoft.com> What is the motivation for this PR? With the latest image, qos sai tests are failing during test setup with the following error. This is due to the changes introduced in sonic-net/sonic-buildimage#11000. Since swss docker is already stopped prior to this check, bgp docker is stopped and hence the show commands will no longer work 02/08/2022 13:14:58 utilities.wait_until L0113 ERROR | Exception caught while checking ready_for_swap:Traceback (most recent call last): File "/azp/agent/_work/26/s/sonic-mgmt-int/tests/common/utilities.py", line 107, in wait_until check_result = condition(*args, **kwargs) File "/azp/agent/_work/26/s/sonic-mgmt-int/tests/common/system_utils/docker.py", line 186, in ready_for_swap not duthost.is_bgp_state_idle() File "/azp/agent/_work/26/s/sonic-mgmt-int/tests/common/devices/multi_asic.py", line 304, in is_bgp_state_idle return self.sonichost.is_bgp_state_idle() File "/azp/agent/_work/26/s/sonic-mgmt-int/tests/common/devices/sonic.py", line 1657, in is_bgp_state_idle bgp_summary = self.command("show ip bgp summary")["stdout_lines"] File "/azp/agent/_work/26/s/sonic-mgmt-int/tests/common/devices/base.py", line 89, in _run raise RunAnsibleModuleFail("run module {} failed".format(self.module_name), res) RunAnsibleModuleFail: run module command failed, Ansible Results => { "changed": true, "cmd": [ "show", "ip", "bgp", "summary" ], "delta": "0:00:00.881363", "end": "2022-08-02 13:14:58.004283", "failed": true, "invocation": { "module_args": { "_raw_params": "show ip bgp summary", "_uses_shell": false, "argv": null, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "stdin_add_newline": true, "strip_empty_ends": true, "warn": true } }, "msg": "non-zero return code", "rc": 2, "start": "2022-08-02 13:14:57.122920", "stderr": "Usage: show ip [OPTIONS] COMMAND [ARGS]...\nTry "show ip -h" for help.\n\nError: No such command "bgp".", "stderr_lines": [ "Usage: show ip [OPTIONS] COMMAND [ARGS]...", "Try "show ip -h" for help.", "", "Error: No such command "bgp"." ], "stdout": "", "stdout_lines": [] } How did you do it? Remove the bgp idle state check How did you verify/test it? Ran the qos sai testcase with the changes and it passed commit d6e332386641a3e5344220e7b06316c7acc40722 Author: jingwenxie <jingwenxie@microsoft.com> Date: Wed Aug 3 15:42:36 2022 +0800 [tests/override_config_table] Add empty table removal test (#6087) Summary: Add E2E test for empty table removal in Golden Config What is the motivation for this PR? We should have an agreement on how Golden Config removes initial table config. This test is to verify the empty table removal in the E2E test. How did you do it? Add E2E test for empty table removal in Golden Config. How did you verify/test it? kvm test. commit a3e2a8e280da76b33486b3bc4d7c89400bbdc373 Author: jingwenxie <jingwenxie@microsoft.com> Date: Wed Aug 3 15:41:50 2022 +0800 [GCU] Change to identical DUT for cacl test (#6086) Summary: Resolve dualTor cacl test failure. What is the motivation for this PR? Change to identical DUT for cacl test2 How did you do it? Change fixture from duthost to rand_selected_dut commit 185d46f90e2b829e9304a709b4ba6caa294040dd Author: bingwang-ms <66248323+bingwang-ms@users.noreply.github.com> Date: Wed Aug 3 15:13:59 2022 +0800 Support 4 lossless queues in `test_buffer_deployment` (#5921) * Support 4 lossless queues in test_buffer_deployment commit 7d1fb7a2b7d3fcd24a150d59aef4f02133f237c5 Author: Ze Gan <ganze718@gmail.com> Date: Wed Aug 3 15:08:40 2022 +0800 [loganalyzer]: Add ignore log for wpa_supplicant (#6075) What is the motivation for this PR? wpa_supplicant may get the following log ERR macsec#wpa_supplicant[15]: KaY: Life time has not elapsed since prior SAK distributed when the rekey action. But the mka session was finally be recovered otherwise the testcases will fail instead of log checker. How did you do it? Add r, ".* ERR macsec#wpa_supplicant.*KaY: Life time has not elapsed since prior SAK distributed.*" in loganalyzer_common_ignore.txt Signed-off-by: Ze Gan <ganze718@gmail.com> commit 6658606edff2694f4832fe06c69ad8ac5e1452d8 Author: Zhaohui Sun <94606222+ZhaohuiS@users.noreply.github.com> Date: Mon Aug 1 16:42:13 2022 +0800 Add test_lag_db_status and test_lag_db_status_with_po_update back (#6074) What is the motivation for this PR? Add test_lag_db_status and test_lag_db_status_with_po_update back which were reverted in #6058. How did you do it? Enhance these two cases to support 202012 kvm testbed, use wait_until instead of checking interface status immediately after shutdown or no shutdown. How did you verify/test it? Verified test_lag_db_status and test_lag_db_status_with_po_update on these testbeds: * kvm testbed with master image * kvm testbed with 202012 image * T0 * T1-lag * Dualtor Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com> commit 3ea15a685ed58cc93b78e8d9098ff687cd41c8fc Merge: 17720c67 fa259b22 Author: siqbal1986 <shahzad.iqbal@gmail.com> Date: Sun Jul 31 15:37:26 2022 -0700 Merge branch 'master' into bfd_test_multihop commit 17720c671dd125b6ef3237b28a893648b582afe1 Author: Shahzad Iqbal (SHAHZADIQBAL) <SHAHZADIQBAL@ame.gbl> Date: Sun Jul 31 15:21:47 2022 -0700 minor changes suggested in review. commit fa259b22248a957c20ff38abe5c1c82718389fe9 Author: Zhaohui Sun <94606222+ZhaohuiS@users.noreply.github.com> Date: Fri Jul 29 04:05:16 2022 +0800 Revert 2 test cases in test_lag_2 and reduce t1-lag running time (#6058) What is the motivation for this PR? test_lag_db_status_with_po_update can ran success on master image. But failed for image 202012. Because for 202012, there is no netdev_oper_status key in PORT_TABLE ofSTATE_DB, the test case chooses oper_status in APPL_DB , but sync time seems to be different, after shutdown, it checks the status of oper_status immediately, it should wait for a while until the status is correct. Revert these 2 cases firstly, will submit a new PR to add them after enough verification on 202012 and master image. How did you do it? Two changes in this PR: Revert test_lag_db_status and test_lag_db_status_with_po_update. Use --completeness_level=confident to reduce running time, it will pick up 4 ports, not all of them for t1-lag. How did you verify/test it? Run pc/test_lag_2.py Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com> commit 5153efb5b98cad7e21f61bcd01cc7b4cfcc84582 Author: Xin Wang <xiwang5@microsoft.com> Date: Thu Jul 28 16:11:29 2022 +0800 Disable CI testing keep only PR testing (#6061) What is the motivation for this PR? Currently the pipeline has both PR and CI testing enabled. After a PR is merged, the pipeline is triggered again. This is a waste of resource. How did you do it? This change disabled CI testing. The PR testing will be still triggered as usual. Signed-off-by: Xin Wang <xiwang5@microsoft.com> commit 15aa60774a529862c2c348f8af5435a3afc61d0a Author: bingwang-ms <66248323+bingwang-ms@users.noreply.github.com> Date: Thu Jul 28 14:52:01 2022 +0800 Remove 'active_tor_mac' from fixture dualtor_info (#6053) PR #5923 updated fixture dualtor_info to add active_tor_mac. However, it caused issue for some other test cases because the return dict is directly passed to check_tunnel_balance, and the function complained that ``` File "/azp/agent/_work/5/s/tests/dualtor/test_orchagent_standby_tor_downstream.py", line 204, in test_standby_tor_downstream_loopback_route_readded check_tunnel_balance(**params) TypeError: check_tunnel_balance() got an unexpected keyword argument 'active_tor_mac' ``` This PR addressed the issue by removing active_tor_mac from the fixture. The variable is added for test case test_encap_dscp_rewrite and test_bounced_back_traffic_in_expected_queue What is the motivation for this PR? This PR is to fix issue caused by updating fixture dualtor_info. How did you do it? Remove the newly added variable active_tor_mac from fixture dualtor_info. How did you verify/test it? The change is verified by running test cases test_standby_tor_downstream_loopback_route_readded and test_tunnel_qos_remap.py. commit e5daa2aa0c11c949290b2a5e576ad66798cbb81d Author: Liu Shilong <shilongliu@microsoft.com> Date: Thu Jul 28 14:37:42 2022 +0800 [ci] Transfer organization from Azure to sonic-net (#6059) Description of PR Summary: Transfer organization from Azure to sonic-net commit 60886753bd4e4ddb582438c9a0f09711699e792d Author: Andrii-Yosafat Lozovyi <andrii-yosafatx.lozovyi@intel.com> Date: Thu Jul 28 08:22:53 2022 +0300 Fix vrf "KeyError: 'target_dest_mac'" (#5785) Vrf tests cases fail with KeyError 'target_dest_mac', this issue started to appear after PR - 5456 Changes made in this PR should fix KeyError issue in vrf TC "fib_test.FibTest ... ERROR", "", "======================================================================", "ERROR: fib_test.FibTest", "----------------------------------------------------------------------", "Traceback (most recent call last):", " File \"ptftests/fib_test.py\", line 458, in runTest", " self.check_ip_ranges()", " File \"ptftests/fib_test.py\", line 163, in check_ip_ranges", " self.check_ip_range(ip_range, dut_index, ipv4)", " File \"ptftests/fib_test.py\", line 204, in check_ip_range", " self.check_ip_route(src_port, dst_ip, exp_ports, ipv4)", " File \"ptftests/fib_test.py\", line 237, in check_ip_route", " res = self.check_ipv4_route(src_port, dst_ip_addr, dst_port_list)", " File \"ptftests/fib_test.py\", line 267, in check_ipv4_route", " router_mac = self.ptf_test_port_map[str(src_port)]['target_dest_mac']", "KeyError: 'target_dest_mac'", Signed-off-by: Andrii-Yosafat Lozovyi <andrii-yosafatx.lozovyi@intel.com> commit 7c7d5a25ce7cf4a29820a6de86ea9227a62b64de Author: Soumya Velamala <87676006+svelamal@users.noreply.github.com> Date: Wed Jul 27 22:20:07 2022 -0700 Update ip_in_ip_tunnel_test.py (#5853) In test_orchagent_standby_tor_downstream.py, DF bit is set from Cisco-8000 silicon one ASIC since fragmentation on the encapsulated packet is not supported and the expected packet doesn't have it set. This causes the tests to fail despite receiving the complete expected packets. In reference to https://datatracker.ietf.org/doc/html/rfc2003, the outer packet can have the DF bit set when the inner packet does not. Identification, Flags, Fragment Offset These three fields are set as specified in [10]. However, if the "Don't Fragment" bit is set in the inner IP header, it MUST be set in the outer IP header; if the "Don't Fragment" bit is not set in the inner IP header, it MAY be set in the outer IP header, as described in Section 5.1. commit a5ce7f33d6800005b5146ea7a53743711c0d89de Author: slutati1536 <69785882+slutati1536@users.noreply.github.com> Date: Thu Jul 28 08:18:31 2022 +0300 Remove ONIE downgrade from fwutil tests (#5920) What is the motivation for this PR? We had a bug where the switch doesn't reboot back to sonic after ONIE update. Upon further investigation this bug reproduces on a downgrade from ONIE 5.3.007 to a prior version of ONIE. The cause for this bug is that the current ONIE release (ONIE 5.3.007) has a new E2FS package that is not compatible with the old E2FS package version that existed in previous ONIE versions. As we do not expect this flow to occur in production, only in testing, it was decided to remove the ONIE downgrade from fwutil tests. How did you do it? By removing the ONIE from the random components used in the tests. How did you verify/test it? Run the fwutil tests with the change commit 327a2850029347ecfb8b2f3aab316de60127acbc Author: lipxu <108326363+lipxu@users.noreply.github.com> Date: Thu Jul 28 13:14:55 2022 +0800 Fix test_buffer_deployment case NoneType issue (#6050) What is the motivation for this PR? Case test_buffer_deployment failed on the Broadcom devices. How did you do it? Init lossless headroom data for non-mellanox device How did you verify/test it? Re-run the failure case commit ff373cc764a6214e4401c3e9bdf497567e815be8 Author: Anton Ptashnik <antonx.ptashnik@intel.com> Date: Thu Jul 28 07:35:47 2022 +0300 Skipped test_pfc_asym_off_rx_pause_frames for Barefoot platform (#6047) commit fcfe9ebe91151bd54da5bc186d64b7a1d2290ece Author: Oleksandr Kozodoi <oleksandrx.kozodoi@intel.com> Date: Thu Jul 28 07:35:15 2022 +0300 Added ignoring expected errors in test_add_rack TC (#6038) What is the motivation for this PR? There are scenarios, where TC applies patch by using config_updater, but updating is still considered as failed: ``` Failed to apply patch Usage: config apply-patch [OPTIONS] PATCH_FILE_PATH Try "config apply-patch -h" for help. Error: After applying patch to config, there are still some parts not updated ``` Steps from generic_patch.py module cover this case, but TC still failed due to errors of sonic_yang in syslog. So was added fixture which provides an approach for ignoring those errors. How did you do it? Added fixture which provides an approach for ignoring expected error messages of syslog in test_add_rack TC. How did you verify/test it? Run test cases. Tests passed. configlet/test_add_rack.py::test_add_rack PASSED [100%] Signed-off-by: Oleksandr Kozodoi <oleksandrx.kozodoi@intel.com> commit 7e7bb7415f6fb8985ad34e2e1df6b442bd0f7750 Author: andywongarista <78833093+andywongarista@users.noreply.github.com> Date: Wed Jul 27 21:34:01 2022 -0700 [platform_tests/api] Fix reading voltage and max_supp_power issue of the psu test (#6013) What is the motivation for this PR? Fix issues in psu test that are causing failures on Arista platforms How did you do it? Fix test_power to check if reading voltage is supported for psu Fix test_power to correctly check against max_supp_power How did you verify/test it? Tested on Arista platform commit 56b57a2f9a1026937ee2f27a654fe3d0b93f98d6 Author: ppikh <70200079+ppikh@users.noreply.github.com> Date: Thu Jul 28 07:32:13 2022 +0300 [VXLAN] Stabilized VNET VXLAN test in case of scale (#6009) Changed logic to do sleep for longer time after applying VNET VXLAN configs, because current logic does not work in case of scale for example with 330000 vnet routes. Now in case of scale we will wait some time until all configuration applied What is the motivation for this PR? When run test tests/vxlan/test_vnet_vxlan.py with scale num routes 33000 test will fail, because not all routes applied and Wr_ARP test also will fail because warm-reboot does not happen (configuration still in progress). Now this issue fixed. How did you do it? Added sleep in case of scale How did you verify/test it? Executed test tests/vxlan/test_vnet_vxlan.py with and without scale Signed-off-by: Petro Pikh <petrop@nvidia.com> commit b60dcddded76780365c89a82abd98907f8943b9a Author: bingwang-ms <66248323+bingwang-ms@users.noreply.github.com> Date: Thu Jul 28 09:23:25 2022 +0800 [testplan] Add testplan for tunnel QoS remapping (#5508) * Add testplan for tunnel QoS remapping Signed-off-by: bingwang <wang.bing@microsoft.com> commit debe6495bacc7022dd099571f8709e7113029616 Author: wenyiz2021 <91497961+wenyiz2021@users.noreply.github.com> Date: Wed Jul 27 18:04:30 2022 -0700 [MASIC] [PR checks] [mgmt] Add sonic-mgmt PR check (#6043) * Add 'image' option to get sonic-vs-4asic.img * Add multi-asic job * run multi-asic-t1-lag-pr for now which only runs test_bgp_fact.py * Fix space * update dutname to vlab-08 * Update azure-pipelines.yml for Azure Pipelines * Revert back deleted jobs * Add back empty line * Update azure-pipelines.yml * Update displayname * Align format * Make the PR check optional for now * Remove spaces * Remove spaces commit 2da3aad03cd74828f635e7b1d9e7999ea5ea91f3 Author: Stephen Sun <5379172+stephenxs@users.noreply.github.com> Date: Thu Jul 28 08:44:05 2022 +0800 [QoS] Verify the additional lossless queues and PGs in QoS test in dual ToR scenario (#5947) * Provide a WA to verify the additional lossless queues and PGs in QoS test in dual ToR scenario 1. Add a CLI option to specify whether it is to verify ports with or without additional lossless PGs and queues. It will collect all the dual ToR ports at the beginning of the test. If the option is on, the additional lossless PGs/queues will be verified in the corresponding tests 2. Update the logic to fetch buffer profiles from BUFFER_QUEUE and BUFFER_PG table according to the CLI 2. Two additional profiles are introduced for verifying the additional lossless PGs/queues in XON and XOFF test if the CLI option is on 3. For headroom pool test, all 4 DSCPs are passed to PTF script as well as the CLI option. The additional DSCPs will be skipped by the PTF docker according to whether the ports are with additional lossless PGs/queues. 4. For DSCP2queue mapping test, the CLI option is passed to PTF script so that the latter is able to check the mapping according to the CLI option. Signed-off-by: Stephen Sun <stephens@nvidia.com> commit 6af5158a3c7df54c939f1273ab03a1f306246ef7 Author: siqbal1986 <shahzad.iqbal@gmail.com> Date: Wed Jul 27 10:41:46 2022 -0700 BFD test update (#5836) Updated BFD test commit c7d19480a23cad967ffc4dee9f799bbccde6e56c Author: mannytaheri <86314901+mannytaheri@users.noreply.github.com> Date: Wed Jul 27 04:46:59 2022 -0400 Added support for returning failure reason if config_system_checks_passed fails (#5975) What is the motivation for this PR? One of the tasks performed by config_system_checks_passed definition is to check if the dut is running. This is done by executing the command "systemctl is-system-running". It returns "False" If the dut is not running but it does not check why the dut is not running. We need to add support for checking the failure reason How did you do it? Execute the command "systemctl is-system-running". Pass if dut is running. Execute the command "systemctl list-units --state=failed" if dut is not running. This will provide the failure reason. Example of failure reason: In this case tacacs-config.timer loaded failed base.py:82 /data/tests/common/devices/multi_asic.py::_run_on_asics#100: [ixre-cpm-chassis10] AnsibleModule::shell Result => {"stderr_lines": [], "cmd": "systemctl list-units --state=failed", "end": "2022-07-09 10:12:18.720054", "_ansible_no_log": false, "stdout": "UNIT LOAD ACTIVE SUB DESCRIPTION\n\u25cf tacacs-config.timer loaded failed failed Delays tacacs apply until SONiC has started\n\n LOAD = Reflects whether the unit definition was properly loaded.\nACTIVE = The high-level unit activation state, i.e. generalization of SUB.\n SUB = The low-level unit activation state, values depend on unit type.\n1 loaded units listed." How did you verify/test it? Tested the code on a dut when it was in the "running" state Tested the code on a dut when it was in the "degraded" state commit d4edaa3262c5bb83be7990de0e0fb9598b842886 Author: Ze Gan <ganze718@gmail.com> Date: Wed Jul 27 16:42:41 2022 +0800 [loganalyzer]Add ignore log for wpa_supplicant (#5887) #### What is the motivation for this PR? We use SIGINT to stop the wpa_supplication in macsecmgmr. but may get the following log: `wpa_supplicant[388]: eloop: could not process SIGINT or SIGTERM in two seconds. Looks like there#012is a bug that ends up in a busy loop that prevents clean shutdown.#012Killing program forcefully.` #### How did you do it? Add this log to ignore list for log analyzer. Signed-off-by: Ze Gan <ganze718@gmail.com> commit 9f45f195209b707891b21534d666109dd58f7be0 Author: ppikh <70200079+ppikh@users.noreply.github.com> Date: Wed Jul 27 11:40:02 2022 +0300 [sensors] Added new mlnx platforms which have different sensors (#6010) What is the motivation for this PR? Mlnx platforms: 3700, 3700c, 4600c could have different sensors, added logic which allow to test 3700, 3700c ,4600c with different sensors How did you do it? Added new sensors data and improved test to support different sensors(platforms) How did you verify/test it? Executed test_sensors.py Any platform specific information? msn3700, msn3700c, msn4600c Signed-off-by: Petro Pikh <petrop@nvidia.com> commit cf8fa2549d5440c62abe46c2a902cf41e01cfd61 Author: Nick Wang <sh_wang@edge-core.com> Date: Wed Jul 27 16:36:51 2022 +0800 Move test_unknown_mac.py to ARP directory because it it not related to (#6018) For test_unknown_mac.py, it is to test unknown MAC (exists in L2 FDB table but not in ARP table), which is not related to PFC feature. Therefore, the test file should be move to proper directory (ARP). commit bfb4a5965bbc766f826a8272b08456501e047751 Author: cgangx <95741698+cgangx@users.noreply.github.com> Date: Wed Jul 27 16:24:37 2022 +0800 Allow LDP traffic on default LDP port (#6019) Add LDP default port to iptables rule set in case LDP is enabled. What is the motivation for this PR? LDP is enabled by default in our case and should be add to iptables rule set. How did you do it? Add iptables rules allowing traffic on LDP default port. How did you verify/test it? Run test on virtual testbed. Co-authored-by: Gang Chen <gach@microsoft.com> commit a6fde46867abd986ef58bd221397ee2e694b22fd Author: Shahzad Iqbal (SHAHZADIQBAL) <SHAHZADIQBAL@ame.gbl> Date: Tue Jul 26 13:55:56 2022 -0700 Updated BFD multihop test to run between non-subnet ip addresses. commit 6319acec81e2cca81bfacec92bb61ebc58b858e5 Author: SuvarnaMeenakshi <50386592+SuvarnaMeenakshi@users.noreply.github.com> Date: Tue Jul 26 12:58:05 2022 -0700 Avoid using asic_index as list index to get the asic (#6045) What is the motivation for this PR? #5828 - Added asics_present field in inventory to provide the list of asics that are present in supervisor. After this change, asic_index cannot be used to retrive asic_instance from duthost.asics list. This fix is done to get the correct asic_instance based on asic index, without this fix test_pretest can fail on supervisor where the asics_present are not consecutive asic_index How did you do it? remove usage of asic_index as list index. How did you verify/test it? test_pretest passes on chassis after this fix. commit f8dff6d503a8b5386822fbecd0215758b46200e2 Author: roman_savchuk <romanx.savchuk@intel.com> Date: Tue Jul 26 13:20:22 2022 +0300 Added option for enable collecting DB data when TC failed (#5403) PR 5197 fixture for collection DB data when TC failed have been introduced. This is good idea, but has one major impact to test suite / regression run. It adds extra regression time to run. It will be proportionally increase execution time if number of failed cases increased. What is the motivation for this PR? Add ability to add option if engineer wants to collect DB data How did you do it? Added "--collect_db_data option which enables collecting DB data if TC failed How did you verify/test it? Run cases with --collect_db_data, if TC failed - DB data collected. Run cases without --collect_db_data, if TC failed - DB not data collected. commit 4df2f37508a292280beed1936b159fb9f2004285 Author: rskorka <80551811+rskorka@users.noreply.github.com> Date: Tue Jul 26 03:08:44 2022 -0700 Added support for Cisco 8000 virtual sonic DUT (#5908) Add playbooks which allow to use T0/T1 virtual testbeds with Cisco 8000e emulator (functional emulator of Cisco 8000 Series routers). How did you do it? Add start/stop playbooks for new DUT type: 8000e. The playbook starts the emulator inside a docker container (for ease of deployment) and then vm-topology module can connect it to the topology. How did you verify/test it? ./testbed-cli.sh -m veos_vtb -n 4 -k vsonic start-vms server_1 password.txt ./testbed-cli.sh -k vsonic -t vtestbed.yaml -m veos_vtb add-topo 8000e-t0 password.txt ./testbed-cli.sh -t vtestbed.yaml -m veos_vtb deploy-mg 8000e-t0 veos_vtb password.txt cd ../tests ./run_tests.sh -n 8000e-t0 -d vlab-8k-01 -c bgp/test_bgp_fact.py -f ../ansible/vtestbed.yaml -i ../ansible/veos_vtb -e --disable_loganalyzer bgp/test_bgp_fact.py::test_bgp_facts[vlab-8k-01-None] PASSED Any platform specific information? While the 8000e emulator supports many Cisco 8000 platforms, the new 8000e playbooks support Cisco-8102-C64 platform specifically. Other 8000e platforms will be tested and enabled in the future. Co-authored-by: Rafal L Skorka <skorka@cisco.com> commit d0052cee1d4f8973683a08b05feab2111af88814 Author: MirceaDan <mircea-dan.gheorghe@keysight.com> Date: Tue Jul 26 03:05:29 2022 -0700 added support for IxNetwork 9.20 Update2 (#6028) updated for spytest the container environment to add support for ixnetwork 9.20u2 What is the motivation for this PR? previous container version was having support for 9.10, but i got requests to add support for 9.20 How did you do it? just a version bump and better versioning for some python packages to avoid conflicts How did you verify/test it? ran a test via the framework Co-authored-by: MirceaDan <ByReaL@users.noreply.github.com> commit f1141557e47d3d58e8b300d287520075b967acc2 Author: vperumal <vperumal@gmail.com> Date: Tue Jul 26 03:04:01 2022 -0700 Skipping test for absent psu's (#5679) What is the motivation for this PR? Currently the tests are trying to access PSU which are not present and failing for them. Even though there is a skip_psu_list, there is no code present to use it. Added the support for it in all the relevant cases. How did you do it? How did you verify/test it? Verified against cisco-8000 platform Co-authored-by: Perumal Venkatesh <pevenkat@cisco.com> commit 4033ab94a0275066b9ec7e5fbcaab03628088913 Author: oleksandrKovtunenko <104843237+oleksandrKovtunenko@users.noreply.github.com> Date: Tue Jul 26 13:02:40 2022 +0300 added skip LAG ports in case T1 TOPO Fix for issue https://github.com/Azure/sonic-mgmt/issues/5578 (#5704) Fix for issues/5578 skip LAG ports in case t1 topology usage What is the motivation for this PR? fix for #5578 exclude portchanel ports in T1 topology How did you do it? skip PortChannel interfaces on T1 topology How did you verify/test it? run test_qos_sai.py tests on t1-lag topology and check that portchanel ports are excluded Co-authored-by: alexander Kovtunenko <alexander198961@gmail.com> commit f5bcc52a769aee224539b73637a1b054a1b48ce2 Author: roman_savchuk <romanx.savchuk@intel.com> Date: Tue Jul 26 13:01:26 2022 +0300 [cpu_mem_usage] update TC with wait for all critical services is fully started (#5754) During regression cpu_usage TC run's after TC that does config reload at the end. Due to this syncd CPU usage is higher than TC expects as not all services come up and syncd actively works at period after system reload (reboot). What is the motivation for this PR? Make TC persistent. Avoid failures after DUT reboot (reload) How did you do it? Add wait_until method for check if all critical services come up and than run TC How did you verify/test it? Run cpu_mem_usage test case when services up (normal DUT state) and when services has been restarted. TC passed. Any platform specific information? NOTE: if TC run when all services started it takes app 180 sec to pass TC. after reboot or config reload TC run takes 180 sec + time for critical services to be started (200 sec) commit dba8c50864fed6c28fdc2b9417a2c09bf72de38f Author: Nana@Nvidia <78413612+nhe-NV@users.noreply.github.com> Date: Tue Jul 26 09:32:06 2022 +0800 Fix the json dumps failure issue in sonic.py (#6033) What is the motivation for this PR? When the value of facts["asics_present"] is range(0,1), then the json.dumps(facts) will throw exception: TypeError: Object of type 'range' is not JSON serializable, need to change the range(0,1) to list. This PR is to fix the json.dumps(facts) failure issue How did you do it? Change facts["asics_present"] = asics_present if len(asics_present) != 0 else range(facts["num_asic"]) to facts["asics_present"] = asics_present if len(asics_present) != 0 else list(range(facts["num_asic"])) How did you verify/test it? Run the test case, and the json.dumps can pass commit 8e0d21d524c939829c73c830219b78e1ff015e16 Author: bingwang-ms <66248323+bingwang-ms@users.noreply.github.com> Date: Tue Jul 26 08:11:49 2022 +0800 [EVERFLOW][hotfix] Skip EGRESS MIRRORING test on Broadcom platform (#6026) * Skip EGRESS MIRRORING test on Broadcom platform * Skip dnx commit 2a801bc7fef29cd3e2c6fa714f6508b9cad9b0fe Author: Zhaohui Sun <94606222+ZhaohuiS@users.noreply.github.com> Date: Tue Jul 26 07:55:21 2022 +0800 Fix test case issue for dualtor in test_lag_2 (#6025) What is the motivation for this PR? test_lag_db_status failed on dualtor testbed. Two issues: It added pytest.fail wrongly. dut_name, dut_lag = decode_dut_port_name(enum_dut_portchannel_with_completeness_level) It will return one dut_name in one case, but we don't which dut is chosen for dualtor. That's why we loop duthosts to find the correct one and run test. If it loops for another duthost, we should skip it not fail the test case. Add one common function get_duthost_with_name to return duthost, reduce two indents In recover phase, it uses test_lags wrongly for dualtor. Since if test case failed in step 1, test_lags is not defined. We should loop all duthosts to check if interface status is down, if so, recover it by noshutdown. Remove duthost for loop for test_lag_db_status_with_po_update, because this case if for t1-lag only. How did you do it? Remove pytest.fail in test case test_lag_db_status and test_lag_db_status_with_po_update. Enhance recover steps. How did you verify/test it? Run pc/test_lags_2.py::test_lag_db_status. Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com> commit 45b20e67f0aa9541a277c53928c55cda82177f6a Author: Nana@Nvidia <78413612+nhe-NV@users.noreply.github.com> Date: Tue Jul 26 04:40:05 2022 +0800 [Qos]Fix the testQosSaiQSharedWatermark test failure due to SAI queue watermark is not reset (#6001) configuring counterpoll watermark enable doesn't suffice to enable the queue watermark polling if the counter polling was disabled. As as result, the queue watermarks in SAI will not be clear successfully, which will fail the queue watermark test.To avoid that, counter poll should be enabled for both queue and watermark.with this command, the SAI Queue watermark can be cleared. How did you do it? Add "counterpoll queue enable", "counterpoll queue disable" in the resetWatermark fixture How did you verify/test it? Run the testQosSaiQSharedWatermark , test will not fail due to the SAI queue watermark is not cleared commit 0fefd1318014673a351d4afb9cd152cc2f3cb7ef Author: Ye Jianquan <jianquanye@microsoft.com> Date: Mon Jul 25 13:14:24 2022 -0700 [RDMA&SNAPPI] Skip snappi warm-reboot testcases on TD2 platform (#6030) How did you do it? Compare the skipped testcases of tgen and snappi, and skip the ones that need to be skipped in snappi. commit 63c03759f44afe2013c600ae8a7248397f2dea8c Author: Andrii-Yosafat Lozovyi <andrii-yosafatx.lozovyi@intel.com> Date: Mon Jul 25 13:03:04 2022 +0300 [auto-ts] Fix and stabilize auto-techsupport tests (#5986) Summary: Made changes that fixes some issues with auto-techsupport and stabilizes TC Main changes that was made: 1.) Changed --since option from '300 sec': 360 to '300 sec ago': 360 This is made because TC will take actual time and add 300 seconds to it, and will collect logs and core dump only starting from that time according to - Date-input-formats 2.) Check available_tech_support_files before core dump generation is triggered in test_rate_limit_interval Test might fail because when available_tech_support_files is checked after core dump was generated, techsupport dump file is already generated, and TC fails in further steps because expects new_techdump to be generated. Signed-off-by: Andrii-Yosafat Lozovyi <andrii-yosafatx.lozovyi@intel.com> commit 7157274f04c986501141774fb1b53dc9d42b81d2 Author: Xin Wang <xiwang5@microsoft.com> Date: Mon Jul 25 15:00:01 2022 +0800 Improve robustness of remove-topo operation (#6020) What is the motivation for this PR? The testbed-cli.sh remove-topo operation is to remove the topology, including cEOS neighbors, ovs bridge bindings, and ovs bridges. For some reason, the ovs bridges to be removed may be gone. In this case, the testbed-cli.sh remove-topo will fail with not able to find the bridges. Indeed, this failure is unnecessary because this operation is to remove the bridge bindings and bridges. If a bridge is already gone, then we can simply ignore it. How did you do it? This change improved the vm_topology ansible module to skip a bridge if it does not exit while trying to remove the bridge bindings and bridges. With this change, the testbed-cli.sh remove-topo always can succeed and do the job. This PR also improved the error message if run a command failed. How did you verify/test it? Tested on physical and virtual testbed. Tried remove-topo, add-topo and restart-ptf with good and broken test topology. Signed-off-by: Xin Wang <xiwang5@microsoft.com> commit 73a2e5a81045a5d2924f5fdc8c16d037ea0a704d Author: Longxiang Lyu <35479537+lolyu@users.noreply.github.com> Date: Mon Jul 25 14:45:04 2022 +0800 [dualtor][active-active] Unblock fib tests (#6006) Approach What is the motivation for this PR? Enable test_fib on dualtor-mixed testbeds. Signed-off-by: Longxiang Lyu lolv@microsoft.com How did you do it? 1. Add fixture mux_status_from_nic_simulator to interacts with nic_simulator to retrieve mux status for ports in active-active cable type. 2. Add fixture ptf_test_port_map_active_active to build ptf port map that supports dualtor-mixed testbed. The key difference is that, for packets ingressing ptf ports connected to DUTs' active-active cable type ports, the target DUT could be either upper ToR or lower ToR(both ToRs are active), the generated ptf port mapping will be like: ``` u'3': {u'asic_idx': 0, u'target_dest_mac': u'00:aa:bb:cc:dd:ee', u'target_dut': [0, 1], u'target_src_mac': [u'2c:dd:e9:0f:4e:50', u'2c:dd:e9:0f:3d:4c']} ``` 3. Enable ptftests hash_test and fib_test to do I/O verification for this multi-next-hop scenario. For a packet sent from ptf port that connects to an active-active port, the test will try to use the fibs from each ToR to parse its nexthops, and verify the packet forwarding on those nexthops. For ECMP behavior, the test will only check balancing over a single ToR. For example, if a packet destinated to 1.1.1.1 is sent to ptf port eth3, which is connected Ethernet12 of both ToRs, the test will determine the nexthops on both ToRs, if both ToRs forward this packet with the default route, and upper ToR will forward this packet to ptf ports [30, 32, 34, 36], and lower ToR will forward this packet to ptf ports [31, 33, 35, 37], the test will try to verify the packet forwarding on ports [30, 31, 32, 33, 34, 35, 36, 37]. And for balancing, the test will verify the traffic is balanced over a single ToR, so it will try to verify balancing on ports set [30, 32, 34, 36] or [31, 33, 35, 37] separately. How did you verify/test it? Run test_fib on dualtor, dualtor-mixed, t0, and t1 Any platform specific information? Supported testbed topology if it's a new test case? commit e027d9b449d78ac4702e8d8fbb840302eae516e3 Author: Shahzad Iqbal (SHAHZADIQBAL) <SHAHZADIQBAL@ame.gbl> Date: Sat Jul 23 16:45:36 2022 -0700 Added support ofr multi-hop bfd testing. commit 857e83ef07a48af973fa458b268f6db67b0f1454 Author: Longxiang Lyu <35479537+lolyu@users.noreply.github.com> Date: Sat Jul 23 11:01:54 2022 +0800 [dualtor] Leave `icmp_responder` running on `dualtor-mixed` testbeds (#5972) Approach What is the motivation for this PR? On dualtor-mixed testbeds, leave icmp_responder running to avoid introducing unnecessary toggles. Signed-off-by: Longxiang Lyu lolv@microsoft.com How did you do it? Comment out the teardown to stop icmp_responder How did you verify/test it? commit ca4d0fa69063a1a2b3a119d4103304f35d7c0634 Author: Xin Wang <xiwang5@microsoft.com> Date: Fri Jul 22 13:56:13 2022 +0800 Fix filename issue and collect important DB dumps (#6007) What is the motivation for this PR? The autoused fixture collect_db_dump has some issues: * If a test case name has some special characters, the fixture may fail. * It automatically collects dump of all databases for each failed test case. It's unnecessary to collect all the DB dumps. How did you do it? * The collected dumps are stored to a folder named with the test case name. However, test case name may have characters that can't be used in file name. This change added a utility function to remove characters illegal in filename to convert a string to a safe filename. * Not all the collected database dumps are useful for later troubleshooting. This change improved the fixture to only collect dumps of some important databases. * This change also deleted the fetch_dbs function which is not used anywhere. Signed-off-by: Xin Wang <xiwang5@microsoft.com> commit cfc550a27ace8043dd171c113b50a8627c03ecda Author: Shahzad Iqbal <shahzadiqbal@microsoft.com> Date: Thu Jul 21 16:38:29 2022 -0700 LGTM Alrts fixed. removed unused function. commit e33b2284dbe6ae74234c901602d1e1ffaceaf5dd Author: ppikh <70200079+ppikh@users.noreply.github.com> Date: Thu Jul 21 19:41:50 2022 +0300 [auto_techsupport] alligned auto_tech_support tests with latest CLI on SONiC master image (#5954) [auto_techsupport] alligned auto_tech_support tests with latest CLI on SONiC master image CLI has been changed - as result tests started to fail Now CLI parser fixed by implementing multi-branch support and tests should pass Signed-off-by: Petro Pikh <petrop@nvidia.com> commit ffcdd14920e760bc2adeeb9ea6e031075f0c4b55 Author: StormLiangMS <89824293+StormLiangMS@users.noreply.github.com> Date: Thu Jul 21 01:54:29 2022 -0700 [vlan/test_vlan_ping] skip test on broadcom platform #6012 What is the motivation for this PR? To skip test test_vlan_ping which doesn't work for broadcom platform. How did you do it? Add test to tests/common/plugins/conditional_mark/tests_mark_conditions.yaml to skip it. How did you verify/test it? Any platform specific information? broadcom platform Supported testbed topology if it's a new test case? commit b1866a1ba00b2b338356e5999c09c549cb7c64a3 Author: Longxiang Lyu <35479537+lolyu@users.noreply.github.com> Date: Thu Jul 21 11:28:16 2022 +0800 [nic_simulator] Add timeout and common options (#6005) Approach What is the motivation for this PR? mgmt client could not talks to the gRPC server. Signed-off-by: Longxiang Lyu <lolv@microsoft.com> How did you do it? Bring up loopback device to enable mgmt service talks to each gRPC server interacting with SONiC. Add common gRPC options to the nic_simulator. Add timeout for the gRPC calls from mgmt service. How did you verify/test it? Verify mgmt client could talks to the mgmt service of `nic_simulator. Any platform specific information? commit 9acb53139a151a7ade913335b72a4cfa778781d5 Author: Zhaohui Sun <94606222+ZhaohuiS@users.noreply.github.com> Date: Thu Jul 21 10:40:50 2022 +0800 Update require option to False for os_version in test_reporter.py to avoid uploading failure in nightly test (#6015) What is the motivation for this PR? In https://github.com/Azure/sonic-mgmt/pull/5992, wrongly add os_version parameter as required True. It should be false, because if with required True, test_reporter.py will fail to run for current nightly test. It impacts nightly test, should make it robust with current nightly test yaml file. How did you do it? Change it to required False. How did you verify/test it? Run nightly test with pipeline. Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com> commit 656624555e2fa5f99a6ceef740b05d96bd5790ff Author: Zhaohui Sun <94606222+ZhaohuiS@users.noreply.github.com> Date: Thu Jul 21 08:47:30 2022 +0800 Enhance test report to include pipeline results (#5992) What is the motivation for this PR? Currently, when nightly test pipeline fails before running test, test report upload will fail too, because there is no XML file and it will throw error out. We don't know if pipeline does not run on that day, or it fails. How did you do it? Enhance test report upload scripts to record pipeline status and upload it to kusto. Add a new collect_azp_results.py to collect task status for specific pipeline. If there is no XML file, upload summary table with 0 values. Create a new table TestReportPipeline, record testbed name, os version, success tasks, failed tasks and cancelled tasks and upload the record to kusto. How did you verify/test it? Run nightly test and check kusto. Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com> commit cbab17e5b512a9f5028dfde05ff7995df9028d1f Author: ppikh <70200079+ppikh@users.noreply.github.com> Date: Thu Jul 21 03:30:47 2022 +0300 [conditional_mark] Improved conditional_mark plugin to support "OR" or "AND" condition between condition in conditions list (#6008) Improved conditional_mark plugin to support "OR" or "AND" operand between condition in conditions list Previously every time we did AND operand between condition in conditions list, now we can provide "conditions_logical_operator" argument with operation which should be performed between conditions. Possible arguments (by default, if not provided - AND used): ``` conditions_logical_operator: or conditions_logical_operator: and ``` Example of usage (test will be ignored if first or second condition in list True): ``` ecmp/test_fgnhg.py: skip: reason: "Testcase ignored - check ignore condition in ignore file" conditions_logical_operator: or conditions: - "https://redmine.x.com/issues/12345 and 'msn2' in platform" - https://redmine.x.com/issues/54321 ``` Signed-off-by: Petro Pikh <petrop@nvidia.com> commit f526c6f6de1de6e100bf59240155b1d50a824e9d Author: Richard.Yu <richard…

…-net#6068) For cisco-8000 platforms, set forward action on Rx in presence of pfc-wd Change is made after: sonic-net#5665

* Squashed commit of the following: commit 0b6042544e8dcccdcd79a25c7748fd11b9bc27ad Author: siqbal1486 <shahzad.iqbal@microsoft.com> Date: Wed Aug 10 15:26:43 2022 -0700 changed suggested in review. cleanup commit 0cc1d72b7e0c5da97815fc0a69d12d2a0c2171a9 Merge: f6f02f03 6850440d Author: siqbal1486 <shahzad.iqbal@microsoft.com> Date: Wed Aug 10 14:31:28 2022 -0700 Merge branch 'bfd_test_multihop' of https://github.com/siqbal1986/sonic-mgmt into bfd_test_multihop commit 6850440d5f90a1e2ae0d78c2f2f42f9fc39b3c95 Merge: 5924f75c 93323578 Author: siqbal1986 <shahzad.iqbal@gmail.com> Date: Wed Aug 10 14:30:19 2022 -0700 Merge branch 'master' into bfd_test_multihop commit f6f02f036b767dc5012b55bee92f4f3944470083 Merge: f78da62f 5924f75c Author: siqbal1486 <shahzad.iqbal@microsoft.com> Date: Wed Aug 10 13:06:42 2022 -0700 Merge branch 'bfd_test_multihop' of https://github.com/siqbal1986/sonic-mgmt into bfd_test_multihop commit 9332357850282dd61dba5bbfbe68463dd088e91d Author: Jibin Bao <jbao@nvidia.com> Date: Thu Aug 11 00:09:18 2022 +0800 Add test plan for syslog source ip feature (#5943) commit fb51ba2b092ea48d9233e9f3efcc1811afef2668 Author: Nana@Nvidia <78413612+nhe-NV@users.noreply.github.com> Date: Thu Aug 11 00:06:43 2022 +0800 [Qos]TestQosSai should not be skipped on ptf32, ptf64 topo (#6112) - What is the motivation for this PR? For mellanox asic, TestQosSai support to run on ptf32,ptf64 topos, it should be skip on these topos - How did you do it? Add support for ptf32,ptf64 in tests/common/plugins/conditional_mark/tests_mark_conditions.yaml - How did you verify/test it? Run the TestQosSai on ptf topo, and it is not skipped. Change-Id: I6d37aca287e8e797ae43de903920fb61c2e1ae9c commit 22ac478a87f1c81643e6733cb3090e8d5f696d9e Author: Ashwin Srinivasan <93744978+assrinivasan@users.noreply.github.com> Date: Wed Aug 10 08:56:03 2022 -0700 Removed the superfluous pdb trace command from the get_healthy_psu_num function in test_platform_info (#6135) commit 22fb68f8ade261eabaf323ba85ec63028d324d75 Author: Cong Hou <97947969+congh-nvidia@users.noreply.github.com> Date: Wed Aug 10 23:16:45 2022 +0800 [sub-interface] use OrderedDict instead of built-in dict for ptf and dut ports in get_port() function of sub-interface test (#6125) The function get_port() in tests/sub_port_interfaces/sub_ports_helpers.py is using built-in dictionary to store the dut ports and ptf ports selected for the subinterface test. However, because there's no order in the built-in dict, sometimes the dut port could be paired with a wrong ptf port, which will cause the test to fail. In the function get_ports() the dut ports is returned in dict and the ptf ports is returned in list of the dict values, and they are zipped in the caller to do iteration. It is not guaranteed that when zipping, the dut port is paired with the correct ptf port. For example in tests/sub_port_interfaces/conftest.py So need to use OrderedDict instead of built-in dictitonary to store the selected dut ports and ptf port in get_ports(). commit f3748cfef4bca1604037ab116586c7b33a2c8b81 Author: Yutong Zhang <90831468+yutongzhang-microsoft@users.noreply.github.com> Date: Wed Aug 10 20:49:59 2022 +0800 [bugfix] skip vlan/test_vlan_ping.py (#6137) Description of PR In pr #5708 , we skip the test cases in tests/common/plugins/conditional_mark/tests_mark_conditions.yaml. There is a merge conflict and forget to skip vlan/test_vlan_ping.py when the asic_type is broadcom. In this pr, skip this module. What is the motivation for this PR? In pr #5708 , we skip the test cases in tests/common/plugins/conditional_mark/tests_mark_conditions.yaml. There is a merge conflict and forget to skip vlan/test_vlan_ping.py when the asic_type is broadcom. In this pr, skip this module. How did you do it? Add the condition to skip vlan/test_vlan_ping.py. Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com> commit b1f80d1cc63b092f280e6ed3d6e80da251b1fbed Author: Kostiantyn Yarovyi <kostiantynx.yarovyi@intel.com> Date: Wed Aug 10 13:03:29 2022 +0200 add sleep after remove vrf (#6133) What is the motivation for this PR? vrf does not have enough time to remove before a creation. Therefore a test TestVrfDeletion::test_vrf1_neigh_after_restore failed How did you do it? add sleep How did you verify/test it? run vrf/test_vrf.py::TestVrfDeletion::test_vrf1_neigh_after_restore commit 5b9a30c112c08b611c9212efdc30262c31ce7cd1 Author: Yutong Zhang <90831468+yutongzhang-microsoft@users.noreply.github.com> Date: Wed Aug 10 16:33:16 2022 +0800 Restore tacacs_server after the module tacacs/test_accounting.py running. (#6117) Description of PR In module tacacs/test_accounting.py, the fixture check_tacacs use the function setup_tacacs_client to delete the default tacacs server, and set the ptf mgmt ip as tacacs sever ip. But it doesn't restore this config when the module finish running. We want to keep the config in consistent before and after the testcase running, so fix it. What is the motivation for this PR? In module tacacs/test_accounting.py, the fixture check_tacacs use the function setup_tacacs_client to delete the default tacacs server, and set the ptf mgmt ip as tacacs sever ip. But it doesn't restore this config when the module finish running. We want to keep the config in consistent before and after the testcase running, so fix it. How did you do it? Get the default tacacs server and put them into a list, when the module finish running, delete the ptf mgmt ipand restore the default tacacs server ip. How did you verify/test it? Running the test cases in this module and compare the tacacs server ip before and after running. Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com> commit a60e5a6e1d32ea7fa104046d36aae8d1ba707dd9 Author: Zhaohui Sun <94606222+ZhaohuiS@users.noreply.github.com> Date: Wed Aug 10 15:02:15 2022 +0800 Add StartTimestamp column in TestReportPipeline table (#6132) What is the motivation for this PR? Add StartTimestamp column in TestReportPipeline table How did you do it? Use another API to get the start time of pipeline and upload it to Kusto. How did you verify/test it? python3 collect_azp_results.py 8888 python3 report_uploader.py -c "test_result" -e "vms-t0-kvm.201911#132728" -t "vms-t0-kvm" -i "http://****/sonic-broadcom.bin" results SonicTestData Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com> commit 5514acb1a07a61284ce481d788274ad5d2a9ac18 Author: Ihor Chekh <ichekh@nvidia.com> Date: Wed Aug 10 00:42:20 2022 +0300 BFD test fixes and improvements (#6082) *Single hop BFD test fixes and improvements commit f78da62f1afc481fd5b38dc716fce18163a88625 Merge: 3ea15a68 c309ff26 Author: siqbal1986 <shahzad.iqbal@gmail.com> Date: Tue Aug 9 13:01:16 2022 -0700 Merge branch 'master' into bfd_test_multihop commit 5924f75c2db3dbad70c6373f989fdb20b74345d4 Merge: 3ea15a68 c309ff26 Author: siqbal1986 <shahzad.iqbal@gmail.com> Date: Tue Aug 9 13:01:16 2022 -0700 Merge branch 'master' into bfd_test_multihop commit c309ff26b1a4dd1783782f68aeffe464ca68f463 Author: Ye Jianquan <jianquanye@microsoft.com> Date: Tue Aug 9 17:25:36 2022 +0800 [TestbedV2]Convert t1-lag pr test to TestbedV2 (#6127) Convert t1-lag pr test to TestbedV2 Approach What is the motivation for this PR? Convert the t1-lag pr test to TestbedV2, to reduce test time by distributing test cases on multi-instances. Currently, the preparation of the testbed(add-topo, deploy-mg) is operated implicitly, before the testbed is ready, the progress of the test plan keeps 0. We will refine the progress indicator in a future release. The conversion can be dynamically reverted by modifying an AZP library variable: Testbed-Tools/RUN_TEST_BY_SCHEDULER : YES/NO How did you do it? Modify the pipeline yaml file. After converting to TestbedV2, the AZP only create the test plan and poll the result of the test plan. How did you verify/test it? The pass result of this pr is the test result of this pr. Signed-off-by: Jianquan Ye<jianquanye@microsoft.com> commit c3f124f34a37e1fca93aab2e912b418cbb084841 Author: ppikh <70200079+ppikh@users.noreply.github.com> Date: Tue Aug 9 06:20:48 2022 +0300 Fixed dut_basic_facts ansible module to have support SONiC images which does not have attribute "is_supervisor" (#6118) Description of PR Fixed dut_basic_facts ansible module to have support SONiC images which does not have attribute "is_supervisor" Previoulsy when we call dut_basic_facts ansible module on SONiC image which does not have attribute "is_supervisor"(for example: 202012) we received error: AttributeError("'module' object has no attribute 'is_supervisor'" Now issue fixed - script will work on all SONiC branches Issue introduced in PR: #5708 Summary: Fixed dut_basic_facts ansible module to have support SONiC images which does not have attribute "is_supervisor" What is the motivation for this PR? Fix AttributeError("'module' object has no attribute 'is_supervisor'" How did you do it? See code How did you verify/test it? Executed ansible module: dut_basic_facts Signed-off-by: Petro Pikh <petrop@nvidia.com> commit 5ee2f0cd3237114bd4bc0b0cc910dac2788f8123 Author: Ze Gan <ganze718@gmail.com> Date: Tue Aug 9 10:11:39 2022 +0800 Revert "[kvmtest.sh]: Ignore test_t0_sonic temporarily (#6104)" (#6119) This reverts commit d3bc674964cf4244994bb204b37e0c19e140ca10. commit f46f36171819265a3514c48509e8e8b685593ae2 Author: Yutong Zhang <90831468+yutongzhang-microsoft@users.noreply.github.com> Date: Tue Aug 9 07:57:36 2022 +0800 [bugfix] Fix an error in tests_mark_conditions.yaml (#6113) Description of PR There is a condition error in tests_mark_conditions.yaml, fix it. What is the motivation for this PR? There is a condition error in tests_mark_conditions.yaml, fix it. Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com> commit bf02aeae1d6c8e7d4523e21ec5fdf8e04688b689 Author: Lawrence Lee <lawlee@microsoft.com> Date: Mon Aug 8 14:51:06 2022 -0700 [dualtor]: Resolve neighbor after neighbor removal (#6071) - After restarting arp_responder during the test, also restart arp_update process on the DUT to resolve failed neighbor entries - Improve test case cleanup Signed-off-by: Lawrence Lee <lawlee@microsoft.com> commit 41d7b15524017f9a267ff010846d5a1f9681b307 Author: Nana@Nvidia <78413612+nhe-NV@users.noreply.github.com> Date: Mon Aug 8 20:34:45 2022 +0800 Add rif loopback action test plan (#5956) Add test plan for the RIF interface loopback action feature. The HLD for the RIF interface loopback action: https://github.com/sonic-net/SONiC/blob/master/doc/ip-interface/loopback-action/ip-interface-loopback-action-design.md commit 24dad8f4b7036376b873c4a0a71a5b7d8a649be8 Author: Yutong Zhang <90831468+yutongzhang-microsoft@users.noreply.github.com> Date: Mon Aug 8 11:01:03 2022 +0800 Remove "BGP_BBR" in config after test case test_bbr_disabled_dut_asn_in_aspath running. (#6102) Description of PR During the test case test_bbr_disabled_dut_asn_in_aspath running, it changes the status of "BGP_BBR" in config db. But before running, there is no "BGP_BBR" in config db. This cause the inconsistent in config db before and after the test case running. In this pr, we delete the key "BGP_BBR" in config db after the test case running. What is the motivation for this PR? During the test case test_bbr_disabled_dut_asn_in_aspath running, it changes the status of "BGP_BBR" in config db. But before running, there is no "BGP_BBR" in config db. This cause the inconsistent in config db before and after the test case running. In this pr, we delete the key "BGP_BBR" in config db after the test case running. How did you do it? Use configlet to delete the config after test case running. How did you verify/test it? Check the config db before and after the test case running. Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com> commit e011ed0ac2ea7e062bf4f6d55177ecfd4e907569 Author: Jing Zhang <zhangjing@microsoft.com> Date: Sun Aug 7 18:42:01 2022 -0700 Enable `test_normal_op` test cases on active-active dualtor interfaces (#5984) Approach What is the motivation for this PR? To enable dualtor io tests on active-active dualtor ports. How did you do it? 1. Added control utilities in nic_simulator_control, for toggling active-active interfaces to standby/active states on any or both duthosts. Toggles is triggered through cmd line, which is different to active-standby ports. 2. Added active-active type in cable_type fixture. 3. Update test_normal_op cases to adapt active-active interfaces. For some cases, disruption is not expected any more. 4. Adjust test names, commets, to better suit today's usage. How did you verify/test it? Run test cases on mixed topology. commit d3bc674964cf4244994bb204b37e0c19e140ca10 Author: Ze Gan <ganze718@gmail.com> Date: Sun Aug 7 18:41:38 2022 +0800 [kvmtest.sh]: Ignore test_t0_sonic temporarily (#6104) What is the motivation for this PR? There is a bug in vsonic as neighbor devices, ignore t0_sonic temporarily and added it back if the bug is fixed. How did you do it? add || ture in ./run_tests.sh to ignore the test result. Signed-off-by: Ze Gan <ganze718@gmail.com> commit 35a1f1e5b0fc788648dab9405ec2a478732ed99e Author: Xin Wang <xiwang5@microsoft.com> Date: Sat Aug 6 15:40:18 2022 +0800 Fix cEOS duplicated mac address issue on Ubuntu 22.04 (#6090) What is the motivation for this PR? If deploy a topology using cEOS, one of the steps is to create veth interfaces for the cEOS docker containers. For example, the current steps to create backplane interfaces: 1.1 Create veth pair in host for container VM0100 ip link add VM0100-back type veth peer name eth5 1.2 Add the eth5 interface to network namespace of container VM0100 2.1 Create veth pair in host for another container VM0101 ip link add VM0101-back type veth peer name eth5 2.2 Add the eth5 interface to network namespace of container VM0101 As we can see that after step 1.2, eth5 is no longer in the host namespace. Then in step 2.1 we can add another interface with same name eth5. The problem is that on Ubuntu 22.04, mac address of eth5 created in step 2.1 will be the same as the eth5 interface created in step 1.1. Possibly Ubuntu 22.04 is using a different algorithm for assigning mac address to new veth interfaces. If interface name is same, then mac address will be same too. Because all the VMxxxx-back interfaces will be attached to a same ovs bridge, their peer interfaces should not use same mac address. How did you do it? The fix is to create veth interfaces with unique name in host for all cEOS containers in the beginning. Then all the interfaces in different cEOS have unique mac address. How did you verify/test it? Tested using 'testbed-cli.sh remove-topo' and 'testbed-cli.sh add-topo' Signed-off-by: Xin Wang <xiwang5@microsoft.com> commit a3268ac644162c6feff44163f023a0df41ad337a Author: jingwenxie <jingwenxie@microsoft.com> Date: Sat Aug 6 05:26:55 2022 +0800 [tests/configlet] Remove ignore path in addrack test (#6088) Summary: Remove the ignore path that were blocked by YANG before. ### Approach #### What is the motivation for this PR? The ignore_path should be removed in apply-patch operation. #### How did you do it? Remove ignore_path. commit 180641d9fbcbe21af61b6c75364e7d799454649e Author: Vaibhav Hemant Dixit <vaibhav.dixit@microsoft.com> Date: Fri Aug 5 11:31:08 2022 -0700 [decap] Bug fix: add missing import for util function in test_decap (#6072) Bug fix: add missing import for util function in test_decap test_decap is failing on master branch due to a bug introduced by #5834 The changes were tested on 202012, but not on master where the json import did not exist. commit 8ac482562757308f2bc24608d423e3fddc477c06 Author: ShiyanWangMS <shiyanwang@microsoft.com> Date: Thu Aug 4 21:58:59 2022 -0700 Improve debug capability for testcase [test_ecn_during_decap_on_active] (#6091) What is the motivation for this PR? The testcase(test_ecn_during_decap_on_active) results are not stable. Sometime it will fail due to not receiving expected packets. And there is no useful debug information in log file. How did you do it? Add "portstat -c" before sending packets and add "portstat -j" after sending packets. Add "show arp" to quickly identify which is the RX/TX port. How did you verify/test it? Manually run the testcase without Python error. commit 4cac24854714eb8c52a682311bddbb454f3874ee Author: Ashwin Srinivasan <93744978+assrinivasan@users.noreply.github.com> Date: Thu Aug 4 09:59:14 2022 -0700 Adds a function to get the number of healthy PSUs in a device (#6060) commit 98752da2c0410dd31696d400651c516facce62c2 Author: Yutong Zhang <90831468+yutongzhang-microsoft@users.noreply.github.com> Date: Thu Aug 4 14:21:59 2022 +0800 Replace `pytest.skip` in test scripts with conditional marks. (#5708) Description of PR When we use pytest.skip in test scripts, it will first execute some fixtures in test cases, which will waste some time. When using conditional marks to skip test cases, it will skip the case in the collect period, which will not execute fixtures in test cases and save some execute time. In this pr, we replace pytest.skip in test scripts with conditional marks to skip test case in advance and save execute time. What is the motivation for this PR? When we use pytest.skip in test scripts, it will first execute some fixtures in test cases, which will waste some time. When using conditional marks to skip test cases, it will skip the case in the collect period, which will not execute fixtures in test cases and save some running time. In this pr, we replace pytest.skip in test scripts with conditional marks to skip test case in advance and save running time. How did you do it? Replace pytest.skip in test scripts with conditional marks in tests_mark_conditions.yaml. How did you verify/test it? By running whole test cases and observe the running time. Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com> commit 5eb6d94ad0ce3cb25e82b4f47f21044ddc83ce41 Author: Stephen Sun <5379172+stephenxs@users.noreply.github.com> Date: Thu Aug 4 10:51:01 2022 +0800 Fix issue: there should be one DSCP mapped to queue 2/6 in non dual-ToR scenarios (#6089) Signed-off-by: Stephen Sun <stephens@nvidia.com> commit 709d503a8f73ec6d31d9173d7c7701ae0f520ed9 Author: Sudharsan Dhamal Gopalarathnam <dgsudharsan@users.noreply.github.com> Date: Wed Aug 3 09:57:34 2022 -0700 [kvm]Avoid running ebtables test in KVM (#6073) *Avoid running ebtables test in KVM sonic-net/sonic-buildimage#11585 ebtables shouldn't be installed in KVM which blocks L2 forwarding. So removed the logic to install ebtables rules in SONiC. Hence removing the ebtables tests to be executed on KVM. commit c8ff3f0d69f558fee6b48c0d6510b5e7693ec8c2 Author: rraghav-cisco <58446052+rraghav-cisco@users.noreply.github.com> Date: Wed Aug 3 09:07:51 2022 -0700 Adding cisco-8000 to the list of platforms for forward action. (#6068) For cisco-8000 platforms, set forward action on Rx in presence of pfc-wd Change is made after: #5665 commit 96a460309c8c98038d075941b6f050cfb708c025 Author: Neetha John <nejo@microsoft.com> Date: Wed Aug 3 09:06:56 2022 -0700 [swap_syncd] Avoid bgp idle check since bgp docker is already down (#6083) Signed-off-by: Neetha John <nejo@microsoft.com> What is the motivation for this PR? With the latest image, qos sai tests are failing during test setup with the following error. This is due to the changes introduced in sonic-net/sonic-buildimage#11000. Since swss docker is already stopped prior to this check, bgp docker is stopped and hence the show commands will no longer work 02/08/2022 13:14:58 utilities.wait_until L0113 ERROR | Exception caught while checking ready_for_swap:Traceback (most recent call last): File "/azp/agent/_work/26/s/sonic-mgmt-int/tests/common/utilities.py", line 107, in wait_until check_result = condition(*args, **kwargs) File "/azp/agent/_work/26/s/sonic-mgmt-int/tests/common/system_utils/docker.py", line 186, in ready_for_swap not duthost.is_bgp_state_idle() File "/azp/agent/_work/26/s/sonic-mgmt-int/tests/common/devices/multi_asic.py", line 304, in is_bgp_state_idle return self.sonichost.is_bgp_state_idle() File "/azp/agent/_work/26/s/sonic-mgmt-int/tests/common/devices/sonic.py", line 1657, in is_bgp_state_idle bgp_summary = self.command("show ip bgp summary")["stdout_lines"] File "/azp/agent/_work/26/s/sonic-mgmt-int/tests/common/devices/base.py", line 89, in _run raise RunAnsibleModuleFail("run module {} failed".format(self.module_name), res) RunAnsibleModuleFail: run module command failed, Ansible Results => { "changed": true, "cmd": [ "show", "ip", "bgp", "summary" ], "delta": "0:00:00.881363", "end": "2022-08-02 13:14:58.004283", "failed": true, "invocation": { "module_args": { "_raw_params": "show ip bgp summary", "_uses_shell": false, "argv": null, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "stdin_add_newline": true, "strip_empty_ends": true, "warn": true } }, "msg": "non-zero return code", "rc": 2, "start": "2022-08-02 13:14:57.122920", "stderr": "Usage: show ip [OPTIONS] COMMAND [ARGS]...\nTry "show ip -h" for help.\n\nError: No such command "bgp".", "stderr_lines": [ "Usage: show ip [OPTIONS] COMMAND [ARGS]...", "Try "show ip -h" for help.", "", "Error: No such command "bgp"." ], "stdout": "", "stdout_lines": [] } How did you do it? Remove the bgp idle state check How did you verify/test it? Ran the qos sai testcase with the changes and it passed commit d6e332386641a3e5344220e7b06316c7acc40722 Author: jingwenxie <jingwenxie@microsoft.com> Date: Wed Aug 3 15:42:36 2022 +0800 [tests/override_config_table] Add empty table removal test (#6087) Summary: Add E2E test for empty table removal in Golden Config What is the motivation for this PR? We should have an agreement on how Golden Config removes initial table config. This test is to verify the empty table removal in the E2E test. How did you do it? Add E2E test for empty table removal in Golden Config. How did you verify/test it? kvm test. commit a3e2a8e280da76b33486b3bc4d7c89400bbdc373 Author: jingwenxie <jingwenxie@microsoft.com> Date: Wed Aug 3 15:41:50 2022 +0800 [GCU] Change to identical DUT for cacl test (#6086) Summary: Resolve dualTor cacl test failure. What is the motivation for this PR? Change to identical DUT for cacl test2 How did you do it? Change fixture from duthost to rand_selected_dut commit 185d46f90e2b829e9304a709b4ba6caa294040dd Author: bingwang-ms <66248323+bingwang-ms@users.noreply.github.com> Date: Wed Aug 3 15:13:59 2022 +0800 Support 4 lossless queues in `test_buffer_deployment` (#5921) * Support 4 lossless queues in test_buffer_deployment commit 7d1fb7a2b7d3fcd24a150d59aef4f02133f237c5 Author: Ze Gan <ganze718@gmail.com> Date: Wed Aug 3 15:08:40 2022 +0800 [loganalyzer]: Add ignore log for wpa_supplicant (#6075) What is the motivation for this PR? wpa_supplicant may get the following log ERR macsec#wpa_supplicant[15]: KaY: Life time has not elapsed since prior SAK distributed when the rekey action. But the mka session was finally be recovered otherwise the testcases will fail instead of log checker. How did you do it? Add r, ".* ERR macsec#wpa_supplicant.*KaY: Life time has not elapsed since prior SAK distributed.*" in loganalyzer_common_ignore.txt Signed-off-by: Ze Gan <ganze718@gmail.com> commit 6658606edff2694f4832fe06c69ad8ac5e1452d8 Author: Zhaohui Sun <94606222+ZhaohuiS@users.noreply.github.com> Date: Mon Aug 1 16:42:13 2022 +0800 Add test_lag_db_status and test_lag_db_status_with_po_update back (#6074) What is the motivation for this PR? Add test_lag_db_status and test_lag_db_status_with_po_update back which were reverted in #6058. How did you do it? Enhance these two cases to support 202012 kvm testbed, use wait_until instead of checking interface status immediately after shutdown or no shutdown. How did you verify/test it? Verified test_lag_db_status and test_lag_db_status_with_po_update on these testbeds: * kvm testbed with master image * kvm testbed with 202012 image * T0 * T1-lag * Dualtor Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com> commit 3ea15a685ed58cc93b78e8d9098ff687cd41c8fc Merge: 17720c67 fa259b22 Author: siqbal1986 <shahzad.iqbal@gmail.com> Date: Sun Jul 31 15:37:26 2022 -0700 Merge branch 'master' into bfd_test_multihop commit 17720c671dd125b6ef3237b28a893648b582afe1 Author: Shahzad Iqbal (SHAHZADIQBAL) <SHAHZADIQBAL@ame.gbl> Date: Sun Jul 31 15:21:47 2022 -0700 minor changes suggested in review. commit fa259b22248a957c20ff38abe5c1c82718389fe9 Author: Zhaohui Sun <94606222+ZhaohuiS@users.noreply.github.com> Date: Fri Jul 29 04:05:16 2022 +0800 Revert 2 test cases in test_lag_2 and reduce t1-lag running time (#6058) What is the motivation for this PR? test_lag_db_status_with_po_update can ran success on master image. But failed for image 202012. Because for 202012, there is no netdev_oper_status key in PORT_TABLE ofSTATE_DB, the test case chooses oper_status in APPL_DB , but sync time seems to be different, after shutdown, it checks the status of oper_status immediately, it should wait for a while until the status is correct. Revert these 2 cases firstly, will submit a new PR to add them after enough verification on 202012 and master image. How did you do it? Two changes in this PR: Revert test_lag_db_status and test_lag_db_status_with_po_update. Use --completeness_level=confident to reduce running time, it will pick up 4 ports, not all of them for t1-lag. How did you verify/test it? Run pc/test_lag_2.py Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com> commit 5153efb5b98cad7e21f61bcd01cc7b4cfcc84582 Author: Xin Wang <xiwang5@microsoft.com> Date: Thu Jul 28 16:11:29 2022 +0800 Disable CI testing keep only PR testing (#6061) What is the motivation for this PR? Currently the pipeline has both PR and CI testing enabled. After a PR is merged, the pipeline is triggered again. This is a waste of resource. How did you do it? This change disabled CI testing. The PR testing will be still triggered as usual. Signed-off-by: Xin Wang <xiwang5@microsoft.com> commit 15aa60774a529862c2c348f8af5435a3afc61d0a Author: bingwang-ms <66248323+bingwang-ms@users.noreply.github.com> Date: Thu Jul 28 14:52:01 2022 +0800 Remove 'active_tor_mac' from fixture dualtor_info (#6053) PR #5923 updated fixture dualtor_info to add active_tor_mac. However, it caused issue for some other test cases because the return dict is directly passed to check_tunnel_balance, and the function complained that ``` File "/azp/agent/_work/5/s/tests/dualtor/test_orchagent_standby_tor_downstream.py", line 204, in test_standby_tor_downstream_loopback_route_readded check_tunnel_balance(**params) TypeError: check_tunnel_balance() got an unexpected keyword argument 'active_tor_mac' ``` This PR addressed the issue by removing active_tor_mac from the fixture. The variable is added for test case test_encap_dscp_rewrite and test_bounced_back_traffic_in_expected_queue What is the motivation for this PR? This PR is to fix issue caused by updating fixture dualtor_info. How did you do it? Remove the newly added variable active_tor_mac from fixture dualtor_info. How did you verify/test it? The change is verified by running test cases test_standby_tor_downstream_loopback_route_readded and test_tunnel_qos_remap.py. commit e5daa2aa0c11c949290b2a5e576ad66798cbb81d Author: Liu Shilong <shilongliu@microsoft.com> Date: Thu Jul 28 14:37:42 2022 +0800 [ci] Transfer organization from Azure to sonic-net (#6059) Description of PR Summary: Transfer organization from Azure to sonic-net commit 60886753bd4e4ddb582438c9a0f09711699e792d Author: Andrii-Yosafat Lozovyi <andrii-yosafatx.lozovyi@intel.com> Date: Thu Jul 28 08:22:53 2022 +0300 Fix vrf "KeyError: 'target_dest_mac'" (#5785) Vrf tests cases fail with KeyError 'target_dest_mac', this issue started to appear after PR - 5456 Changes made in this PR should fix KeyError issue in vrf TC "fib_test.FibTest ... ERROR", "", "======================================================================", "ERROR: fib_test.FibTest", "----------------------------------------------------------------------", "Traceback (most recent call last):", " File \"ptftests/fib_test.py\", line 458, in runTest", " self.check_ip_ranges()", " File \"ptftests/fib_test.py\", line 163, in check_ip_ranges", " self.check_ip_range(ip_range, dut_index, ipv4)", " File \"ptftests/fib_test.py\", line 204, in check_ip_range", " self.check_ip_route(src_port, dst_ip, exp_ports, ipv4)", " File \"ptftests/fib_test.py\", line 237, in check_ip_route", " res = self.check_ipv4_route(src_port, dst_ip_addr, dst_port_list)", " File \"ptftests/fib_test.py\", line 267, in check_ipv4_route", " router_mac = self.ptf_test_port_map[str(src_port)]['target_dest_mac']", "KeyError: 'target_dest_mac'", Signed-off-by: Andrii-Yosafat Lozovyi <andrii-yosafatx.lozovyi@intel.com> commit 7c7d5a25ce7cf4a29820a6de86ea9227a62b64de Author: Soumya Velamala <87676006+svelamal@users.noreply.github.com> Date: Wed Jul 27 22:20:07 2022 -0700 Update ip_in_ip_tunnel_test.py (#5853) In test_orchagent_standby_tor_downstream.py, DF bit is set from Cisco-8000 silicon one ASIC since fragmentation on the encapsulated packet is not supported and the expected packet doesn't have it set. This causes the tests to fail despite receiving the complete expected packets. In reference to https://datatracker.ietf.org/doc/html/rfc2003, the outer packet can have the DF bit set when the inner packet does not. Identification, Flags, Fragment Offset These three fields are set as specified in [10]. However, if the "Don't Fragment" bit is set in the inner IP header, it MUST be set in the outer IP header; if the "Don't Fragment" bit is not set in the inner IP header, it MAY be set in the outer IP header, as described in Section 5.1. commit a5ce7f33d6800005b5146ea7a53743711c0d89de Author: slutati1536 <69785882+slutati1536@users.noreply.github.com> Date: Thu Jul 28 08:18:31 2022 +0300 Remove ONIE downgrade from fwutil tests (#5920) What is the motivation for this PR? We had a bug where the switch doesn't reboot back to sonic after ONIE update. Upon further investigation this bug reproduces on a downgrade from ONIE 5.3.007 to a prior version of ONIE. The cause for this bug is that the current ONIE release (ONIE 5.3.007) has a new E2FS package that is not compatible with the old E2FS package version that existed in previous ONIE versions. As we do not expect this flow to occur in production, only in testing, it was decided to remove the ONIE downgrade from fwutil tests. How did you do it? By removing the ONIE from the random components used in the tests. How did you verify/test it? Run the fwutil tests with the change commit 327a2850029347ecfb8b2f3aab316de60127acbc Author: lipxu <108326363+lipxu@users.noreply.github.com> Date: Thu Jul 28 13:14:55 2022 +0800 Fix test_buffer_deployment case NoneType issue (#6050) What is the motivation for this PR? Case test_buffer_deployment failed on the Broadcom devices. How did you do it? Init lossless headroom data for non-mellanox device How did you verify/test it? Re-run the failure case commit ff373cc764a6214e4401c3e9bdf497567e815be8 Author: Anton Ptashnik <antonx.ptashnik@intel.com> Date: Thu Jul 28 07:35:47 2022 +0300 Skipped test_pfc_asym_off_rx_pause_frames for Barefoot platform (#6047) commit fcfe9ebe91151bd54da5bc186d64b7a1d2290ece Author: Oleksandr Kozodoi <oleksandrx.kozodoi@intel.com> Date: Thu Jul 28 07:35:15 2022 +0300 Added ignoring expected errors in test_add_rack TC (#6038) What is the motivation for this PR? There are scenarios, where TC applies patch by using config_updater, but updating is still considered as failed: ``` Failed to apply patch Usage: config apply-patch [OPTIONS] PATCH_FILE_PATH Try "config apply-patch -h" for help. Error: After applying patch to config, there are still some parts not updated ``` Steps from generic_patch.py module cover this case, but TC still failed due to errors of sonic_yang in syslog. So was added fixture which provides an approach for ignoring those errors. How did you do it? Added fixture which provides an approach for ignoring expected error messages of syslog in test_add_rack TC. How did you verify/test it? Run test cases. Tests passed. configlet/test_add_rack.py::test_add_rack PASSED [100%] Signed-off-by: Oleksandr Kozodoi <oleksandrx.kozodoi@intel.com> commit 7e7bb7415f6fb8985ad34e2e1df6b442bd0f7750 Author: andywongarista <78833093+andywongarista@users.noreply.github.com> Date: Wed Jul 27 21:34:01 2022 -0700 [platform_tests/api] Fix reading voltage and max_supp_power issue of the psu test (#6013) What is the motivation for this PR? Fix issues in psu test that are causing failures on Arista platforms How did you do it? Fix test_power to check if reading voltage is supported for psu Fix test_power to correctly check against max_supp_power How did you verify/test it? Tested on Arista platform commit 56b57a2f9a1026937ee2f27a654fe3d0b93f98d6 Author: ppikh <70200079+ppikh@users.noreply.github.com> Date: Thu Jul 28 07:32:13 2022 +0300 [VXLAN] Stabilized VNET VXLAN test in case of scale (#6009) Changed logic to do sleep for longer time after applying VNET VXLAN configs, because current logic does not work in case of scale for example with 330000 vnet routes. Now in case of scale we will wait some time until all configuration applied What is the motivation for this PR? When run test tests/vxlan/test_vnet_vxlan.py with scale num routes 33000 test will fail, because not all routes applied and Wr_ARP test also will fail because warm-reboot does not happen (configuration still in progress). Now this issue fixed. How did you do it? Added sleep in case of scale How did you verify/test it? Executed test tests/vxlan/test_vnet_vxlan.py with and without scale Signed-off-by: Petro Pikh <petrop@nvidia.com> commit b60dcddded76780365c89a82abd98907f8943b9a Author: bingwang-ms <66248323+bingwang-ms@users.noreply.github.com> Date: Thu Jul 28 09:23:25 2022 +0800 [testplan] Add testplan for tunnel QoS remapping (#5508) * Add testplan for tunnel QoS remapping Signed-off-by: bingwang <wang.bing@microsoft.com> commit debe6495bacc7022dd099571f8709e7113029616 Author: wenyiz2021 <91497961+wenyiz2021@users.noreply.github.com> Date: Wed Jul 27 18:04:30 2022 -0700 [MASIC] [PR checks] [mgmt] Add sonic-mgmt PR check (#6043) * Add 'image' option to get sonic-vs-4asic.img * Add multi-asic job * run multi-asic-t1-lag-pr for now which only runs test_bgp_fact.py * Fix space * update dutname to vlab-08 * Update azure-pipelines.yml for Azure Pipelines * Revert back deleted jobs * Add back empty line * Update azure-pipelines.yml * Update displayname * Align format * Make the PR check optional for now * Remove spaces * Remove spaces commit 2da3aad03cd74828f635e7b1d9e7999ea5ea91f3 Author: Stephen Sun <5379172+stephenxs@users.noreply.github.com> Date: Thu Jul 28 08:44:05 2022 +0800 [QoS] Verify the additional lossless queues and PGs in QoS test in dual ToR scenario (#5947) * Provide a WA to verify the additional lossless queues and PGs in QoS test in dual ToR scenario 1. Add a CLI option to specify whether it is to verify ports with or without additional lossless PGs and queues. It will collect all the dual ToR ports at the beginning of the test. If the option is on, the additional lossless PGs/queues will be verified in the corresponding tests 2. Update the logic to fetch buffer profiles from BUFFER_QUEUE and BUFFER_PG table according to the CLI 2. Two additional profiles are introduced for verifying the additional lossless PGs/queues in XON and XOFF test if the CLI option is on 3. For headroom pool test, all 4 DSCPs are passed to PTF script as well as the CLI option. The additional DSCPs will be skipped by the PTF docker according to whether the ports are with additional lossless PGs/queues. 4. For DSCP2queue mapping test, the CLI option is passed to PTF script so that the latter is able to check the mapping according to the CLI option. Signed-off-by: Stephen Sun <stephens@nvidia.com> commit 6af5158a3c7df54c939f1273ab03a1f306246ef7 Author: siqbal1986 <shahzad.iqbal@gmail.com> Date: Wed Jul 27 10:41:46 2022 -0700 BFD test update (#5836) Updated BFD test commit c7d19480a23cad967ffc4dee9f799bbccde6e56c Author: mannytaheri <86314901+mannytaheri@users.noreply.github.com> Date: Wed Jul 27 04:46:59 2022 -0400 Added support for returning failure reason if config_system_checks_passed fails (#5975) What is the motivation for this PR? One of the tasks performed by config_system_checks_passed definition is to check if the dut is running. This is done by executing the command "systemctl is-system-running". It returns "False" If the dut is not running but it does not check why the dut is not running. We need to add support for checking the failure reason How did you do it? Execute the command "systemctl is-system-running". Pass if dut is running. Execute the command "systemctl list-units --state=failed" if dut is not running. This will provide the failure reason. Example of failure reason: In this case tacacs-config.timer loaded failed base.py:82 /data/tests/common/devices/multi_asic.py::_run_on_asics#100: [ixre-cpm-chassis10] AnsibleModule::shell Result => {"stderr_lines": [], "cmd": "systemctl list-units --state=failed", "end": "2022-07-09 10:12:18.720054", "_ansible_no_log": false, "stdout": "UNIT LOAD ACTIVE SUB DESCRIPTION\n\u25cf tacacs-config.timer loaded failed failed Delays tacacs apply until SONiC has started\n\n LOAD = Reflects whether the unit definition was properly loaded.\nACTIVE = The high-level unit activation state, i.e. generalization of SUB.\n SUB = The low-level unit activation state, values depend on unit type.\n1 loaded units listed." How did you verify/test it? Tested the code on a dut when it was in the "running" state Tested the code on a dut when it was in the "degraded" state commit d4edaa3262c5bb83be7990de0e0fb9598b842886 Author: Ze Gan <ganze718@gmail.com> Date: Wed Jul 27 16:42:41 2022 +0800 [loganalyzer]Add ignore log for wpa_supplicant (#5887) #### What is the motivation for this PR? We use SIGINT to stop the wpa_supplication in macsecmgmr. but may get the following log: `wpa_supplicant[388]: eloop: could not process SIGINT or SIGTERM in two seconds. Looks like there#012is a bug that ends up in a busy loop that prevents clean shutdown.#012Killing program forcefully.` #### How did you do it? Add this log to ignore list for log analyzer. Signed-off-by: Ze Gan <ganze718@gmail.com> commit 9f45f195209b707891b21534d666109dd58f7be0 Author: ppikh <70200079+ppikh@users.noreply.github.com> Date: Wed Jul 27 11:40:02 2022 +0300 [sensors] Added new mlnx platforms which have different sensors (#6010) What is the motivation for this PR? Mlnx platforms: 3700, 3700c, 4600c could have different sensors, added logic which allow to test 3700, 3700c ,4600c with different sensors How did you do it? Added new sensors data and improved test to support different sensors(platforms) How did you verify/test it? Executed test_sensors.py Any platform specific information? msn3700, msn3700c, msn4600c Signed-off-by: Petro Pikh <petrop@nvidia.com> commit cf8fa2549d5440c62abe46c2a902cf41e01cfd61 Author: Nick Wang <sh_wang@edge-core.com> Date: Wed Jul 27 16:36:51 2022 +0800 Move test_unknown_mac.py to ARP directory because it it not related to (#6018) For test_unknown_mac.py, it is to test unknown MAC (exists in L2 FDB table but not in ARP table), which is not related to PFC feature. Therefore, the test file should be move to proper directory (ARP). commit bfb4a5965bbc766f826a8272b08456501e047751 Author: cgangx <95741698+cgangx@users.noreply.github.com> Date: Wed Jul 27 16:24:37 2022 +0800 Allow LDP traffic on default LDP port (#6019) Add LDP default port to iptables rule set in case LDP is enabled. What is the motivation for this PR? LDP is enabled by default in our case and should be add to iptables rule set. How did you do it? Add iptables rules allowing traffic on LDP default port. How did you verify/test it? Run test on virtual testbed. Co-authored-by: Gang Chen <gach@microsoft.com> commit a6fde46867abd986ef58bd221397ee2e694b22fd Author: Shahzad Iqbal (SHAHZADIQBAL) <SHAHZADIQBAL@ame.gbl> Date: Tue Jul 26 13:55:56 2022 -0700 Updated BFD multihop test to run between non-subnet ip addresses. commit 6319acec81e2cca81bfacec92bb61ebc58b858e5 Author: SuvarnaMeenakshi <50386592+SuvarnaMeenakshi@users.noreply.github.com> Date: Tue Jul 26 12:58:05 2022 -0700 Avoid using asic_index as list index to get the asic (#6045) What is the motivation for this PR? #5828 - Added asics_present field in inventory to provide the list of asics that are present in supervisor. After this change, asic_index cannot be used to retrive asic_instance from duthost.asics list. This fix is done to get the correct asic_instance based on asic index, without this fix test_pretest can fail on supervisor where the asics_present are not consecutive asic_index How did you do it? remove usage of asic_index as list index. How did you verify/test it? test_pretest passes on chassis after this fix. commit f8dff6d503a8b5386822fbecd0215758b46200e2 Author: roman_savchuk <romanx.savchuk@intel.com> Date: Tue Jul 26 13:20:22 2022 +0300 Added option for enable collecting DB data when TC failed (#5403) PR 5197 fixture for collection DB data when TC failed have been introduced. This is good idea, but has one major impact to test suite / regression run. It adds extra regression time to run. It will be proportionally increase execution time if number of failed cases increased. What is the motivation for this PR? Add ability to add option if engineer wants to collect DB data How did you do it? Added "--collect_db_data option which enables collecting DB data if TC failed How did you verify/test it? Run cases with --collect_db_data, if TC failed - DB data collected. Run cases without --collect_db_data, if TC failed - DB not data collected. commit 4df2f37508a292280beed1936b159fb9f2004285 Author: rskorka <80551811+rskorka@users.noreply.github.com> Date: Tue Jul 26 03:08:44 2022 -0700 Added support for Cisco 8000 virtual sonic DUT (#5908) Add playbooks which allow to use T0/T1 virtual testbeds with Cisco 8000e emulator (functional emulator of Cisco 8000 Series routers). How did you do it? Add start/stop playbooks for new DUT type: 8000e. The playbook starts the emulator inside a docker container (for ease of deployment) and then vm-topology module can connect it to the topology. How did you verify/test it? ./testbed-cli.sh -m veos_vtb -n 4 -k vsonic start-vms server_1 password.txt ./testbed-cli.sh -k vsonic -t vtestbed.yaml -m veos_vtb add-topo 8000e-t0 password.txt ./testbed-cli.sh -t vtestbed.yaml -m veos_vtb deploy-mg 8000e-t0 veos_vtb password.txt cd ../tests ./run_tests.sh -n 8000e-t0 -d vlab-8k-01 -c bgp/test_bgp_fact.py -f ../ansible/vtestbed.yaml -i ../ansible/veos_vtb -e --disable_loganalyzer bgp/test_bgp_fact.py::test_bgp_facts[vlab-8k-01-None] PASSED Any platform specific information? While the 8000e emulator supports many Cisco 8000 platforms, the new 8000e playbooks support Cisco-8102-C64 platform specifically. Other 8000e platforms will be tested and enabled in the future. Co-authored-by: Rafal L Skorka <skorka@cisco.com> commit d0052cee1d4f8973683a08b05feab2111af88814 Author: MirceaDan <mircea-dan.gheorghe@keysight.com> Date: Tue Jul 26 03:05:29 2022 -0700 added support for IxNetwork 9.20 Update2 (#6028) updated for spytest the container environment to add support for ixnetwork 9.20u2 What is the motivation for this PR? previous container version was having support for 9.10, but i got requests to add support for 9.20 How did you do it? just a version bump and better versioning for some python packages to avoid conflicts How did you verify/test it? ran a test via the framework Co-authored-by: MirceaDan <ByReaL@users.noreply.github.com> commit f1141557e47d3d58e8b300d287520075b967acc2 Author: vperumal <vperumal@gmail.com> Date: Tue Jul 26 03:04:01 2022 -0700 Skipping test for absent psu's (#5679) What is the motivation for this PR? Currently the tests are trying to access PSU which are not present and failing for them. Even though there is a skip_psu_list, there is no code present to use it. Added the support for it in all the relevant cases. How did you do it? How did you verify/test it? Verified against cisco-8000 platform Co-authored-by: Perumal Venkatesh <pevenkat@cisco.com> commit 4033ab94a0275066b9ec7e5fbcaab03628088913 Author: oleksandrKovtunenko <104843237+oleksandrKovtunenko@users.noreply.github.com> Date: Tue Jul 26 13:02:40 2022 +0300 added skip LAG ports in case T1 TOPO Fix for issue https://github.com/Azure/sonic-mgmt/issues/5578 (#5704) Fix for issues/5578 skip LAG ports in case t1 topology usage What is the motivation for this PR? fix for #5578 exclude portchanel ports in T1 topology How did you do it? skip PortChannel interfaces on T1 topology How did you verify/test it? run test_qos_sai.py tests on t1-lag topology and check that portchanel ports are excluded Co-authored-by: alexander Kovtunenko <alexander198961@gmail.com> commit f5bcc52a769aee224539b73637a1b054a1b48ce2 Author: roman_savchuk <romanx.savchuk@intel.com> Date: Tue Jul 26 13:01:26 2022 +0300 [cpu_mem_usage] update TC with wait for all critical services is fully started (#5754) During regression cpu_usage TC run's after TC that does config reload at the end. Due to this syncd CPU usage is higher than TC expects as not all services come up and syncd actively works at period after system reload (reboot). What is the motivation for this PR? Make TC persistent. Avoid failures after DUT reboot (reload) How did you do it? Add wait_until method for check if all critical services come up and than run TC How did you verify/test it? Run cpu_mem_usage test case when services up (normal DUT state) and when services has been restarted. TC passed. Any platform specific information? NOTE: if TC run when all services started it takes app 180 sec to pass TC. after reboot or config reload TC run takes 180 sec + time for critical services to be started (200 sec) commit dba8c50864fed6c28fdc2b9417a2c09bf72de38f Author: Nana@Nvidia <78413612+nhe-NV@users.noreply.github.com> Date: Tue Jul 26 09:32:06 2022 +0800 Fix the json dumps failure issue in sonic.py (#6033) What is the motivation for this PR? When the value of facts["asics_present"] is range(0,1), then the json.dumps(facts) will throw exception: TypeError: Object of type 'range' is not JSON serializable, need to change the range(0,1) to list. This PR is to fix the json.dumps(facts) failure issue How did you do it? Change facts["asics_present"] = asics_present if len(asics_present) != 0 else range(facts["num_asic"]) to facts["asics_present"] = asics_present if len(asics_present) != 0 else list(range(facts["num_asic"])) How did you verify/test it? Run the test case, and the json.dumps can pass commit 8e0d21d524c939829c73c830219b78e1ff015e16 Author: bingwang-ms <66248323+bingwang-ms@users.noreply.github.com> Date: Tue Jul 26 08:11:49 2022 +0800 [EVERFLOW][hotfix] Skip EGRESS MIRRORING test on Broadcom platform (#6026) * Skip EGRESS MIRRORING test on Broadcom platform * Skip dnx commit 2a801bc7fef29cd3e2c6fa714f6508b9cad9b0fe Author: Zhaohui Sun <94606222+ZhaohuiS@users.noreply.github.com> Date: Tue Jul 26 07:55:21 2022 +0800 Fix test case issue for dualtor in test_lag_2 (#6025) What is the motivation for this PR? test_lag_db_status failed on dualtor testbed. Two issues: It added pytest.fail wrongly. dut_name, dut_lag = decode_dut_port_name(enum_dut_portchannel_with_completeness_level) It will return one dut_name in one case, but we don't which dut is chosen for dualtor. That's why we loop duthosts to find the correct one and run test. If it loops for another duthost, we should skip it not fail the test case. Add one common function get_duthost_with_name to return duthost, reduce two indents In recover phase, it uses test_lags wrongly for dualtor. Since if test case failed in step 1, test_lags is not defined. We should loop all duthosts to check if interface status is down, if so, recover it by noshutdown. Remove duthost for loop for test_lag_db_status_with_po_update, because this case if for t1-lag only. How did you do it? Remove pytest.fail in test case test_lag_db_status and test_lag_db_status_with_po_update. Enhance recover steps. How did you verify/test it? Run pc/test_lags_2.py::test_lag_db_status. Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com> commit 45b20e67f0aa9541a277c53928c55cda82177f6a Author: Nana@Nvidia <78413612+nhe-NV@users.noreply.github.com> Date: Tue Jul 26 04:40:05 2022 +0800 [Qos]Fix the testQosSaiQSharedWatermark test failure due to SAI queue watermark is not reset (#6001) configuring counterpoll watermark enable doesn't suffice to enable the queue watermark polling if the counter polling was disabled. As as result, the queue watermarks in SAI will not be clear successfully, which will fail the queue watermark test.To avoid that, counter poll should be enabled for both queue and watermark.with this command, the SAI Queue watermark can be cleared. How did you do it? Add "counterpoll queue enable", "counterpoll queue disable" in the resetWatermark fixture How did you verify/test it? Run the testQosSaiQSharedWatermark , test will not fail due to the SAI queue watermark is not cleared commit 0fefd1318014673a351d4afb9cd152cc2f3cb7ef Author: Ye Jianquan <jianquanye@microsoft.com> Date: Mon Jul 25 13:14:24 2022 -0700 [RDMA&SNAPPI] Skip snappi warm-reboot testcases on TD2 platform (#6030) How did you do it? Compare the skipped testcases of tgen and snappi, and skip the ones that need to be skipped in snappi. commit 63c03759f44afe2013c600ae8a7248397f2dea8c Author: Andrii-Yosafat Lozovyi <andrii-yosafatx.lozovyi@intel.com> Date: Mon Jul 25 13:03:04 2022 +0300 [auto-ts] Fix and stabilize auto-techsupport tests (#5986) Summary: Made changes that fixes some issues with auto-techsupport and stabilizes TC Main changes that was made: 1.) Changed --since option from '300 sec': 360 to '300 sec ago': 360 This is made because TC will take actual time and add 300 seconds to it, and will collect logs and core dump only starting from that time according to - Date-input-formats 2.) Check available_tech_support_files before core dump generation is triggered in test_rate_limit_interval Test might fail because when available_tech_support_files is checked after core dump was generated, techsupport dump file is already generated, and TC fails in further steps because expects new_techdump to be generated. Signed-off-by: Andrii-Yosafat Lozovyi <andrii-yosafatx.lozovyi@intel.com> commit 7157274f04c986501141774fb1b53dc9d42b81d2 Author: Xin Wang <xiwang5@microsoft.com> Date: Mon Jul 25 15:00:01 2022 +0800 Improve robustness of remove-topo operation (#6020) What is the motivation for this PR? The testbed-cli.sh remove-topo operation is to remove the topology, including cEOS neighbors, ovs bridge bindings, and ovs bridges. For some reason, the ovs bridges to be removed may be gone. In this case, the testbed-cli.sh remove-topo will fail with not able to find the bridges. Indeed, this failure is unnecessary because this operation is to remove the bridge bindings and bridges. If a bridge is already gone, then we can simply ignore it. How did you do it? This change improved the vm_topology ansible module to skip a bridge if it does not exit while trying to remove the bridge bindings and bridges. With this change, the testbed-cli.sh remove-topo always can succeed and do the job. This PR also improved the error message if run a command failed. How did you verify/test it? Tested on physical and virtual testbed. Tried remove-topo, add-topo and restart-ptf with good and broken test topology. Signed-off-by: Xin Wang <xiwang5@microsoft.com> commit 73a2e5a81045a5d2924f5fdc8c16d037ea0a704d Author: Longxiang Lyu <35479537+lolyu@users.noreply.github.com> Date: Mon Jul 25 14:45:04 2022 +0800 [dualtor][active-active] Unblock fib tests (#6006) Approach What is the motivation for this PR? Enable test_fib on dualtor-mixed testbeds. Signed-off-by: Longxiang Lyu lolv@microsoft.com How did you do it? 1. Add fixture mux_status_from_nic_simulator to interacts with nic_simulator to retrieve mux status for ports in active-active cable type. 2. Add fixture ptf_test_port_map_active_active to build ptf port map that supports dualtor-mixed testbed. The key difference is that, for packets ingressing ptf ports connected to DUTs' active-active cable type ports, the target DUT could be either upper ToR or lower ToR(both ToRs are active), the generated ptf port mapping will be like: ``` u'3': {u'asic_idx': 0, u'target_dest_mac': u'00:aa:bb:cc:dd:ee', u'target_dut': [0, 1], u'target_src_mac': [u'2c:dd:e9:0f:4e:50', u'2c:dd:e9:0f:3d:4c']} ``` 3. Enable ptftests hash_test and fib_test to do I/O verification for this multi-next-hop scenario. For a packet sent from ptf port that connects to an active-active port, the test will try to use the fibs from each ToR to parse its nexthops, and verify the packet forwarding on those nexthops. For ECMP behavior, the test will only check balancing over a single ToR. For example, if a packet destinated to 1.1.1.1 is sent to ptf port eth3, which is connected Ethernet12 of both ToRs, the test will determine the nexthops on both ToRs, if both ToRs forward this packet with the default route, and upper ToR will forward this packet to ptf ports [30, 32, 34, 36], and lower ToR will forward this packet to ptf ports [31, 33, 35, 37], the test will try to verify the packet forwarding on ports [30, 31, 32, 33, 34, 35, 36, 37]. And for balancing, the test will verify the traffic is balanced over a single ToR, so it will try to verify balancing on ports set [30, 32, 34, 36] or [31, 33, 35, 37] separately. How did you verify/test it? Run test_fib on dualtor, dualtor-mixed, t0, and t1 Any platform specific information? Supported testbed topology if it's a new test case? commit e027d9b449d78ac4702e8d8fbb840302eae516e3 Author: Shahzad Iqbal (SHAHZADIQBAL) <SHAHZADIQBAL@ame.gbl> Date: Sat Jul 23 16:45:36 2022 -0700 Added support ofr multi-hop bfd testing. commit 857e83ef07a48af973fa458b268f6db67b0f1454 Author: Longxiang Lyu <35479537+lolyu@users.noreply.github.com> Date: Sat Jul 23 11:01:54 2022 +0800 [dualtor] Leave `icmp_responder` running on `dualtor-mixed` testbeds (#5972) Approach What is the motivation for this PR? On dualtor-mixed testbeds, leave icmp_responder running to avoid introducing unnecessary toggles. Signed-off-by: Longxiang Lyu lolv@microsoft.com How did you do it? Comment out the teardown to stop icmp_responder How did you verify/test it? commit ca4d0fa69063a1a2b3a119d4103304f35d7c0634 Author: Xin Wang <xiwang5@microsoft.com> Date: Fri Jul 22 13:56:13 2022 +0800 Fix filename issue and collect important DB dumps (#6007) What is the motivation for this PR? The autoused fixture collect_db_dump has some issues: * If a test case name has some special characters, the fixture may fail. * It automatically collects dump of all databases for each failed test case. It's unnecessary to collect all the DB dumps. How did you do it? * The collected dumps are stored to a folder named with the test case name. However, test case name may have characters that can't be used in file name. This change added a utility function to remove characters illegal in filename to convert a string to a safe filename. * Not all the collected database dumps are useful for later troubleshooting. This change improved the fixture to only collect dumps of some important databases. * This change also deleted the fetch_dbs function which is not used anywhere. Signed-off-by: Xin Wang <xiwang5@microsoft.com> commit cfc550a27ace8043dd171c113b50a8627c03ecda Author: Shahzad Iqbal <shahzadiqbal@microsoft.com> Date: Thu Jul 21 16:38:29 2022 -0700 LGTM Alrts fixed. removed unused function. commit e33b2284dbe6ae74234c901602d1e1ffaceaf5dd Author: ppikh <70200079+ppikh@users.noreply.github.com> Date: Thu Jul 21 19:41:50 2022 +0300 [auto_techsupport] alligned auto_tech_support tests with latest CLI on SONiC master image (#5954) [auto_techsupport] alligned auto_tech_support tests with latest CLI on SONiC master image CLI has been changed - as result tests started to fail Now CLI parser fixed by implementing multi-branch support and tests should pass Signed-off-by: Petro Pikh <petrop@nvidia.com> commit ffcdd14920e760bc2adeeb9ea6e031075f0c4b55 Author: StormLiangMS <89824293+StormLiangMS@users.noreply.github.com> Date: Thu Jul 21 01:54:29 2022 -0700 [vlan/test_vlan_ping] skip test on broadcom platform #6012 What is the motivation for this PR? To skip test test_vlan_ping which doesn't work for broadcom platform. How did you do it? Add test to tests/common/plugins/conditional_mark/tests_mark_conditions.yaml to skip it. How did you verify/test it? Any platform specific information? broadcom platform Supported testbed topology if it's a new test case? commit b1866a1ba00b2b338356e5999c09c549cb7c64a3 Author: Longxiang Lyu <35479537+lolyu@users.noreply.github.com> Date: Thu Jul 21 11:28:16 2022 +0800 [nic_simulator] Add timeout and common options (#6005) Approach What is the motivation for this PR? mgmt client could not talks to the gRPC server. Signed-off-by: Longxiang Lyu <lolv@microsoft.com> How did you do it? Bring up loopback device to enable mgmt service talks to each gRPC server interacting with SONiC. Add common gRPC options to the nic_simulator. Add timeout for the gRPC calls from mgmt service. How did you verify/test it? Verify mgmt client could talks to the mgmt service of `nic_simulator. Any platform specific information? commit 9acb53139a151a7ade913335b72a4cfa778781d5 Author: Zhaohui Sun <94606222+ZhaohuiS@users.noreply.github.com> Date: Thu Jul 21 10:40:50 2022 +0800 Update require option to False for os_version in test_reporter.py to avoid uploading failure in nightly test (#6015) What is the motivation for this PR? In https://github.com/Azure/sonic-mgmt/pull/5992, wrongly add os_version parameter as required True. It should be false, because if with required True, test_reporter.py will fail to run for current nightly test. It impacts nightly test, should make it robust with current nightly test yaml file. How did you do it? Change it to required False. How did you verify/test it? Run nightly test with pipeline. Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com> commit 656624555e2fa5f99a6ceef740b05d96bd5790ff Author: Zhaohui Sun <94606222+ZhaohuiS@users.noreply.github.com> Date: Thu Jul 21 08:47:30 2022 +0800 Enhance test report to include pipeline results (#5992) What is the motivation for this PR? Currently, when nightly test pipeline fails before running test, test report upload will fail too, because there is no XML file and it will throw error out. We don't know if pipeline does not run on that day, or it fails. How did you do it? Enhance test report upload scripts to record pipeline status and upload it to kusto. Add a new collect_azp_results.py to collect task status for specific pipeline. If there is no XML file, upload summary table with 0 values. Create a new table TestReportPipeline, record testbed name, os version, success tasks, failed tasks and cancelled tasks and upload the record to kusto. How did you verify/test it? Run nightly test and check kusto. Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com> commit cbab17e5b512a9f5028dfde05ff7995df9028d1f Author: ppikh <70200079+ppikh@users.noreply.github.com> Date: Thu Jul 21 03:30:47 2022 +0300 [conditional_mark] Improved conditional_mark plugin to support "OR" or "AND" condition between condition in conditions list (#6008) Improved conditional_mark plugin to support "OR" or "AND" operand between condition in conditions list Previously every time we did AND operand between condition in conditions list, now we can provide "conditions_logical_operator" argument with operation which should be performed between conditions. Possible arguments (by default, if not provided - AND used): ``` conditions_logical_operator: or conditions_logical_operator: and ``` Example of usage (test will be ignored if first or second condition in list True): ``` ecmp/test_fgnhg.py: skip: reason: "Testcase ignored - check ignore condition in ignore file" conditions_logical_operator: or conditions: - "https://redmine.x.com/issues/12345 and 'msn2' in platform" - https://redmine.x.com/issues/54321 ``` Signed-off-by: Petro Pikh <petrop@nvidia.com> commit f526c6f6de1de6e100bf59240155b1d50a824e9d Author: Richard.Yu <richard…

For zero buffer pfcwd detection logic, verify forward action on Rx

4ba1ed0

Signed-off-by: Neetha John <nejo@microsoft.com>

neethajohn added the Enhancement label May 17, 2022

neethajohn requested a review from a team as a code owner May 17, 2022 19:07

neethajohn mentioned this pull request May 17, 2022

[201911][pfcwd] Avoid ingress drop by not attaching zero profiles when pfc storm is detected sonic-net/sonic-swss#2279

Merged

vivekrnv mentioned this pull request Jun 1, 2022

[PFC_WD] Avoid applying ZeroBuffer Profiles to ingress PG when a PFC storm is detected sonic-net/sonic-swss#2304

Merged

neethajohn mentioned this pull request Jun 9, 2022

[sonic-swss] : PFCWD recovery changes using DLR_INIT sonic-net/sonic-swss#2316

Merged

Update pfcwd wb tests for zero buffer enhancements

c038715

Signed-off-by: Neetha John <nejo@microsoft.com>

neethajohn added the Request for 202012 branch label Jul 6, 2022

neethajohn requested a review from yxieca July 11, 2022 21:03

yxieca approved these changes Jul 11, 2022

View reviewed changes

neethajohn merged commit 8c2320b into sonic-net:master Jul 11, 2022

neethajohn deleted the pfcwd_func_zero_buf branch July 11, 2022 21:07

wangxin added the Included in 202012 branch label Jul 12, 2022

rraghav-cisco mentioned this pull request Jul 29, 2022

Adding cisco-8000 to the list of platforms for forward action. #6068

Merged

neethajohn pushed a commit that referenced this pull request Aug 3, 2022

Adding cisco-8000 to the list of platforms for forward action. (#6068)

c8ff3f0

For cisco-8000 platforms, set forward action on Rx in presence of pfc-wd Change is made after: #5665

wangxin pushed a commit that referenced this pull request Aug 4, 2022

Adding cisco-8000 to the list of platforms for forward action. (#6068)

88f4e19

For cisco-8000 platforms, set forward action on Rx in presence of pfc-wd Change is made after: #5665

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pfcwd] For zero buffer pfcwd detection logic, verify forward action on Rx #5665

[pfcwd] For zero buffer pfcwd detection logic, verify forward action on Rx #5665

neethajohn commented May 17, 2022 •

edited

Loading

vivekrnv commented Jun 21, 2022

[pfcwd] For zero buffer pfcwd detection logic, verify forward action on Rx #5665

[pfcwd] For zero buffer pfcwd detection logic, verify forward action on Rx #5665

Conversation

neethajohn commented May 17, 2022 • edited Loading

Description of PR

Type of change

Back port request

How did you verify/test it?

vivekrnv commented Jun 21, 2022

neethajohn commented May 17, 2022 •

edited

Loading