Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5232/PFC/WD/The orchagent crashes with PG3 and PG4 traffic send together #5136

Open
mini-nair-dell opened this issue Aug 10, 2020 · 4 comments

Comments

@mini-nair-dell
Copy link

Topology:

(source) Ixia1 ------50gig-----Switch------100gig-------Ixia2 (Reciever)

Description:

The orchagent crashes when both the Priority group P3 and P4 traffic ise send together.
The orchagent crashes for P3 or P4 queues, thE traffic behaves as lossy, and PCWD doesnt work
The issue is not seen with uplink and downlink as both 100

Repro Steps:

  1. Send P3 traffic from source
  2. Send pause frames from the reciver. No issue seen
  3. Send P4 traffic along with P3 from source
  4. Send pause frames for P3 and P4 from receiver>>>> The orchagent crashes.

The stack trace:

Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/orchagent...(no debugging symbols found)...done.
[New LWP 64]
[New LWP 63]
[New LWP 44]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/orchagent -d /var/log/swss -b 8192 -m 3c:2c:30:6d:7e:80'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fe2649f4fff in raise () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7fe25bfa7700 (LWP 64))]
(gdb) bt
#0 0x00007fe2649f4fff in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007fe2649f642a in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007fe26530d0ad in __gnu_cxx::__verbose_terminate_handler() ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007fe26530b066 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007fe26530b0b1 in std::terminate() ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007fe265335e9e in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007fe2664cd4a4 in start_thread ()
from /lib/x86_64-linux-gnu/libpthread.so.0
#7 0x00007fe264aaad0f in clone () from /lib/x86_64-linux-gnu/libc.so.6

Logs:

For queue 3:
Aug 10 18:02:45.942360 sonic-s5232-01 NOTICE swss#orchagent: :- startWdActionOnQueue: Receive notification, storm
Aug 10 18:02:45.942546 sonic-s5232-01 NOTICE swss#orchagent: :- startWdActionOnQueue: PFC Watchdog detected PFC storm on port Ethernet28, queue index 3, queue id 0x150000000002d3 and port id 0x100000000000f.
Aug 10 18:02:45.943140 sonic-s5232-01 NOTICE swss#orchagent: :- addAclTable: Created ACL table IngressTable_PfcWdAclHandler_3 oid:7000000000b15
Aug 10 18:02:45.943486 sonic-s5232-01 NOTICE swss#orchagent: :- add: Successfully created ACL rule Rule_PfcWdAclHandler_3 in table IngressTable_PfcWdAclHandler_3
Aug 10 18:02:45.944050 sonic-s5232-01 NOTICE swss#orchagent: :- addAclTable: Created ACL table EgressTable_PfcWdAclHandler_3 oid:7000000000b18
Aug 10 18:02:45.944408 sonic-s5232-01 NOTICE swss#orchagent: :- add: Successfully created ACL rule Rule_PfcWdAclHandler_3 in table EgressTable_PfcWdAclHandler_3
Aug 10 18:02:45.946835 sonic-s5232-01 ERR syncd#syncd: [none] _brcm_sai_create_acl_table:5525 field group create failed with error No resources for operation (0xfffffff2).
Aug 10 18:02:45.947015 sonic-s5232-01 ERR syncd#syncd: [none] brcm_sai_create_acl_table:109 create table entry failed with error -4.
Aug 10 18:02:45.947104 sonic-s5232-01 ERR syncd#syncd: :- processEvent: attr: SAI_ACL_TABLE_ATTR_ACL_BIND_POINT_TYPE_LIST: 1:SAI_ACL_BIND_POINT_TYPE_PORT
Aug 10 18:02:45.947232 sonic-s5232-01 ERR syncd#syncd: :- processEvent: attr: SAI_ACL_TABLE_ATTR_FIELD_TC: true
Aug 10 18:02:45.947322 sonic-s5232-01 ERR syncd#syncd: :- processEvent: attr: SAI_ACL_TABLE_ATTR_ACL_STAGE: SAI_ACL_STAGE_INGRESS
Aug 10 18:02:45.947527 sonic-s5232-01 NOTICE swss#orchagent: :- handle_switch_shutdown_request: switch shutdown request

For queue 4:
root@sonic-s5232-01:~# zcat /var/log/syslog.*.gz | grep -i "Aug 7 21:37"
Aug 7 21:37:07.736268 sonic-s5232-01 NOTICE swss#orchagent: :- startWdActionOnQueue: Receive notification, storm
Aug 7 21:37:07.736532 sonic-s5232-01 NOTICE swss#orchagent: :- startWdActionOnQueue: PFC Watchdog detected PFC storm on port Ethernet28, queue index 4, queue id 0x150000000002d4 and port id 0x100000000000f.
Aug 7 21:37:07.736936 sonic-s5232-01 NOTICE swss#orchagent: :- addAclTable: Created ACL table IngressTable_PfcWdAclHandler_4 oid:7000000000b1e
Aug 7 21:37:07.737397 sonic-s5232-01 NOTICE swss#orchagent: :- add: Successfully created ACL rule Rule_PfcWdAclHandler_4 in table IngressTable_PfcWdAclHandler_4
Aug 7 21:37:07.738036 sonic-s5232-01 NOTICE swss#orchagent: :- addAclTable: Created ACL table EgressTable_PfcWdAclHandler_4 oid:7000000000b21
Aug 7 21:37:07.738476 sonic-s5232-01 NOTICE swss#orchagent: :- add: Successfully created ACL rule Rule_PfcWdAclHandler_4 in table EgressTable_PfcWdAclHandler_4
Aug 7 21:37:07.741205 sonic-s5232-01 ERR syncd#syncd: [none] _brcm_sai_create_acl_table:5525 field group create failed with error No resources for operation (0xfffffff2).
Aug 7 21:37:07.741540 sonic-s5232-01 ERR syncd#syncd: [none] brcm_sai_create_acl_table:109 create table entry failed with error -4.

The syslogs are attached

Thanks
Mini

@wendani
Copy link
Contributor

wendani commented Aug 10, 2020

Looks like no spare TCAM table to accommodate the second TC

@mini-nair-dell
Copy link
Author

MSFT RDMA Qualification Test Cases v1.4.docx

Attached the test plan doc.

@AshokDaparthi
Copy link
Contributor

@wendani - All PFCWD entries can go in same TCAM table. Not sure why it table created for PG

@AshokDaparthi
Copy link
Contributor

Root cause: Each PG is creating separate FP group. In S5232 might be having limited FP groups, which leads to syncd crashed with resource limitation. All storm entries can be go to same FP group and also at present Rules are not deleted after restore form storm, Which will take up entries unnecessary.

Broadcom has plan to upstream fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants