Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mirroring] Missing per-vendor validation of mirror session queue parameter #8189

Closed
raphaelt-nvidia opened this issue Jul 15, 2021 · 12 comments · Fixed by sonic-net/sonic-swss#1957
Labels

Comments

@raphaelt-nvidia
Copy link
Contributor

Description

It is possible to configure a value for queue in a mirror session that is outside the range supported by the switch vendor. The flow is that MirrorOrch::createEntry calls activateSession which calls status = sai_mirror_api->create_mirror_session. If it fails, e.g. due to invalid queue, activateSession calls handleSaiCreateStatus, which in any failure case initiates exiting orchagent.

I think the solution involves orchagent comparing the user-supplied value of queue with SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES as part of its earlier validations, and not attempting to create the session if this validation fails.

Steps to reproduce the issue:

  1. config mirror_session erspan add ms6 40.0.0.1 40.0.0.2 63 250 "" 15 Ethernet136 tx

Describe the results you received:

ERR syncd#SDK: [SPAN.ERR] SWITCH_PRIO 15 is outside valid range 0-14.
ERR syncd#SDK: [SPAN.ERR] __span_add failed, err: Parameter Error.
ERR syncd#SDK: [SAI_MIRROR.ERR] mlnx_sai_mirror.c[2647]- mlnx_create_mirror_session: Error creating mirror session
ERR syncd#SDK: :- sendApiResponse: api SAI_COMMON_API_CREATE failed in syncd mode: SAI_STATUS_INVALID_PARAMETER
ERR syncd#SDK: :- processQuadEvent: attr: SAI_MIRROR_SESSION_ATTR_TC: 15
ERR swss#orchagent: :- create: create status: SAI_STATUS_INVALID_PARAMETER
ERR swss#orchagent: :- activateSession: Failed to activate mirroring session ms6
ERR swss#orchagent: :- handleSaiCreateStatus: Encountered failure in create operation, exiting orchagent, SAI API: SAI_API_MIRROR, status: SAI_STATUS_INVALID_PARAMETER

Describe the results you expected:

A single error in log from orchagent only, no exiting the process.

Output of show version:

SONiC Software Version: SONiC.202106.0-ebc962c22
Distribution: Debian 10.10
Kernel: 4.19.0-12-2-amd64
Build commit: ebc962c
Build date: Thu Jun 24 17:59:46 UTC 2021
Built by: raphaelt@r-build-sonic05

Platform: x86_64-mlnx_msn2410-r0
HwSKU: ACS-MSN2410
ASIC: mellanox
ASIC Count: 1
Serial Number: MT1921X01546
Uptime: 13:55:39 up 2:44, 2 users, load average: 1.58, 1.65, 1.65

Docker images:
REPOSITORY TAG IMAGE ID SIZE
docker-syncd-mlnx 202106.0-ebc962c22 d00568dad081 961MB
docker-syncd-mlnx latest d00568dad081 961MB
docker-snmp 202106.0-ebc962c22 69263d7ff0ca 454MB
docker-snmp latest 69263d7ff0ca 454MB
docker-dhcp-relay 202106.0-ebc962c22 12bd15bd2785 420MB
docker-dhcp-relay latest 12bd15bd2785 420MB
docker-teamd 202106.0-ebc962c22 92f76cb0c626 424MB
docker-teamd latest 92f76cb0c626 424MB
docker-nat 202106.0-ebc962c22 36858e717963 427MB
docker-nat latest 36858e717963 427MB
docker-router-advertiser 202106.0-ebc962c22 34fc1983f5d3 413MB
docker-router-advertiser latest 34fc1983f5d3 413MB
docker-platform-monitor 202106.0-ebc962c22 74dd0063f256 738MB
docker-platform-monitor latest 74dd0063f256 738MB
docker-lldp 202106.0-ebc962c22 ce9a9e42fecd 453MB
docker-lldp latest ce9a9e42fecd 453MB
docker-macsec 202106.0-ebc962c22 89d2ebe2ddaa 427MB
docker-macsec latest 89d2ebe2ddaa 427MB
docker-database 202106.0-ebc962c22 32ce5d38e191 413MB
docker-database latest 32ce5d38e191 413MB
docker-orchagent 202106.0-ebc962c22 c6254d27f85d 442MB
docker-orchagent latest c6254d27f85d 442MB
docker-sonic-telemetry 202106.0-ebc962c22 bd04168564d8 501MB
docker-sonic-telemetry latest bd04168564d8 501MB
docker-sonic-mgmt-framework 202106.0-ebc962c22 a5ec5adee9cd 570MB
docker-sonic-mgmt-framework latest a5ec5adee9cd 570MB
docker-fpm-frr 202106.0-ebc962c22 5436d8f7e983 442MB
docker-fpm-frr latest 5436d8f7e983 442MB
docker-sflow 202106.0-ebc962c22 0d353410dcb2 425MB
docker-sflow latest 0d353410dcb2 425MB

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@zhangyanzhao
Copy link
Collaborator

orchagent crashed for this issue. @shi-su please work with @raphaelt-nvidia on this issue.

@zhangyanzhao zhangyanzhao added the Triaged this issue has been triaged label Jul 21, 2021
@raphaelt-nvidia
Copy link
Contributor Author

Here is suggested code to add to MirrorOrch::createEntry that would address this:

        else if (fvField(i) == MIRROR_SESSION_QUEUE)
        {
            sai_status_t status;
            sai_attribute_t attr;
            entry.queue = to_uint<uint8_t>(fvValue(i));
            attr.id = SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES;
            status = sai_switch_api->get_switch_attribute(gSwitchId, 1, &attr);
            if (status == SAI_STATUS_SUCCESS)
            {
                if (entry.queue > attr.value.u8)
                {
                    SWSS_LOG_ERROR("Failed to get valid queue %s", fvValue(i).c_str());
                    return task_process_status::task_invalid_entry;
                }
            }
        }

My doubt is about the line

                if (entry.queue > attr.value.u8)

Should it be '>' or '>='? I chose '>' because of the attribute's description:

/**
 * @brief Maximum traffic classes limit
 *
 * @type sai_uint8_t
 * @flags READ_ONLY
 */
SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES,

For example, if the valid values for queue are 0-14, I would expect get_switch_attribute to return 14. Or would you expect 15?

@shi-su
Copy link
Contributor

shi-su commented Aug 5, 2021

Here is suggested code to add to MirrorOrch::createEntry that would address this:

        else if (fvField(i) == MIRROR_SESSION_QUEUE)
        {
            sai_status_t status;
            sai_attribute_t attr;
            entry.queue = to_uint<uint8_t>(fvValue(i));
            attr.id = SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES;
            status = sai_switch_api->get_switch_attribute(gSwitchId, 1, &attr);
            if (status == SAI_STATUS_SUCCESS)
            {
                if (entry.queue > attr.value.u8)
                {
                    SWSS_LOG_ERROR("Failed to get valid queue %s", fvValue(i).c_str());
                    return task_process_status::task_invalid_entry;
                }
            }
        }

My doubt is about the line

                if (entry.queue > attr.value.u8)

Should it be '>' or '>='? I chose '>' because of the attribute's description:

/**
 * @brief Maximum traffic classes limit
 *
 * @type sai_uint8_t
 * @flags READ_ONLY
 */
SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES,

For example, if the valid values for queue are 0-14, I would expect get_switch_attribute to return 14. Or would you expect 15?

I am not sure about this, but I felt it should be '>=' since it seems to be defined as the maximum number and 0-14 includes 15 values. In an extreme case that it supports nothing, the return value should be 0, otherwise, it could not cover it. This is just my feeling, we need to check.

Another concern about this change is that it seems to check for SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES every time before creating an entry. It seems to be a bit inefficient. Maybe we can query that value during initialization and save it for future use?

@raphaelt-nvidia
Copy link
Contributor Author

I see that many instances of calling sai_switch_api->get_switch_attribute occur in initialization routines, so I agree with your second comment. If you wish me to supply the code, I would like to wait until there is a definite ruling on my original question, so that we can test it with a conforming SAI implementation.

@shi-su
Copy link
Contributor

shi-su commented Aug 5, 2021

I see that many instances of calling sai_switch_api->get_switch_attribute occur in initialization routines, so I agree with your second comment. If you wish me to supply the code, I would like to wait until there is a definite ruling on my original question, so that we can test it with a conforming SAI implementation.

I tried your proposed fix. Interestingly, for the scenario that valid values for queue are 0-14, the SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES value is neither 14 nor 15. I got 16 for this attribute. Not sure what went wrong.

@raphaelt-nvidia
Copy link
Contributor Author

The 16 is a bug in our SAI implementation that has been changed to 15 - not sure when that goes upstream. My question here is whether it should actually be changed to 14.

@shi-su
Copy link
Contributor

shi-su commented Aug 6, 2021

The 16 is a bug in our SAI implementation that has been changed to 15 - not sure when that goes upstream. My question here is whether it should actually be changed to 14.

Per my understanding, 15 makes better sense. Yet this question should better be answered by someone who is familiar with the SAI definition.

@raphaelt-nvidia
Copy link
Contributor Author

How do we identify and get the attention of the people who should decide? It seems to me that if the decision is 15, then the comment "Maximum traffic classes limit" should be changed and clarified. Also say something about the lower bound. Is the assumption that 0 is the lowest valid value true for all platforms?

@shi-su
Copy link
Contributor

shi-su commented Aug 9, 2021

@zhangyanzhao Could you please help get attention from someone who has expertise in the SAI definition? This seems to go beyond my knowledge set.

@raphaelt-nvidia
Copy link
Contributor Author

Ping

@zhangyanzhao
Copy link
Collaborator

Ack and let me see how can help on the SAI part.

@prsunny
Copy link
Contributor

prsunny commented Sep 11, 2021

Agree with @shi-su on the suggestion. It should be >= . This is what is being done for cases like MAX ECMP groups (ref). Also as suggested above, please get the value only once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants