Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HLD] DSCP/TC remapping for tunnel traffic #950

Merged
merged 14 commits into from
Mar 29, 2022
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
258 changes: 258 additions & 0 deletions doc/qos/tunnel_dscp_remapping.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,258 @@
# DSCP remapping for tunnel traffic

## 1 Table of Content ###

- [Revision](#11-revision)
- [Scope](#2-scope)
- [Definitions/Abbreviations](#3-definitionsabbreviations)
- [Overview](#4-overview)
- [Design](#5-design)
- [SWSS Schema](#51-swss-schema)
- [Define new table for mapping](#511-define-new-table-for-mapping)
- [Update existing TUNNEL table](#512-update-existing-tunnel-table)
- [Define new field for extra lossless queues](#513-define-new-field-for-extra-lossless-queues)
- [SAI attribute](#52-sai-attribute)
- [orchagent](#53-orchagent)
- [Test requirement](#6-test-requirement)
- [Open Questions](#7-open-questions)

### 1.1 Revision ###
| Rev | Date | Author | Change Description |
|:---:|:-----------:|:------------------:|-----------------------------------|
| 0.1 | | Bing Wang | Initial version |


## 2 Scope ##

This document covers high level design of DSCP and TC remapping for tunnel traffic in SONiC.

## 3 Definitions/Abbreviations ##


| Term | Meaning |
|:--------:|:---------------------------------------------:|
| PFC | Priority-based Flow Control |
| TC | Traffic class|
| DSCP| Differentiated Services Code Point |

## 4 Overview

In Dual-ToR scenario, PFC deadlock can happen if two servers are congested at same time and start sending PFC pause frames to both upper ToR and lower ToR. It is because the south bound traffic from T1 to standby ToR is bounced back to T1 via the same queue. Even when the block condition is resolved, the pause condition among T1 and both ToRs can stay forever and no traffic will go through.

To avoid this scenario, the bounced-back traffic from standby ToR to T1 will be remapped into another queue. For example, the traffic flow is as below if we are going to remap traffic in queue 3 to queue 2.

<p align=center>
<img src="dscp-remapping-images/Bounced-back-traffic-flow.png" alt="Figure 1. Bounced back traffic flow">
</p>
lguohan marked this conversation as resolved.
Show resolved Hide resolved

When congestion happens on server, the traffic flow of PFC pause frames is as below diagram.

<p align=center>
<img src="dscp-remapping-images/Bounced-back-traffic-flow-PFC.png" alt="Figure 2. Bounced back traffic flow with PFC pause">
</p>
bingwang-ms marked this conversation as resolved.
Show resolved Hide resolved
The current QoS map architecture allows for port-based selection of each QoS map. However, we are not able to override the port-based QoS map for tunnel traffic.
This design proposes a method to remapping DSCP and TC for tunnel traffic.


## 5 Design ##

### 5.1 SWSS Schema
#### 5.1.1 Define new table for mapping
Update [qos_config.j2](https://github.com/Azure/sonic-buildimage/blob/master/files/build_templates/qos_config.j2) to generate 4 tables for remapping. Currently, the remapping is required in `dual-tor` scenario. So the tables are rendered into `config_db` only when `DEVICE_METADATA['localhost']['subtype'] = 'DualToR`.

Please be noted that below config is to remap traffic in queue 3 to queue 2, and traffic in queue 4 to queue 6.
Before remapping to queue 4 and 6, both queues are required to be cleared. Hence the current `DSCP_TO_TC_MAP|AZURE` in [qos_config.j2](https://github.com/Azure/sonic-buildimage/blob/master/files/build_templates/qos_config.j2) is required to be updated to map DSCP `5` and `48` into other queues.
bingwang-ms marked this conversation as resolved.
Show resolved Hide resolved
* Table for decap

DSCP_TO_TC_MAP for mapping DSCP to TC

```json
"DSCP_TO_TC_MAP": {
bingwang-ms marked this conversation as resolved.
Show resolved Hide resolved
"AZURE_TUNNEL": {
"0": "1",
"1": "1",
"2": "2",
"3": "3",
"4": "4",
"5": "1",
"6": "6",
"7": "1",
"8": "0",
"9": "1"
}
```

TC_TO_PRIORITY_GROUP_MAP for mappping TC to PG

```json
"TC_TO_PRIORITY_GROUP_MAP": {
"AZURE_TUNNEL": {
"0": "0",
"1": "0",
"2": "0",
"3": "2",
"4": "6",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean PG 2 and 6 will be used as lossless PG? In that case BUFFER_PG|<port>|2, BUFFER_PG|<port>|6 need to be configured. But traditional buffer manager supports 3-4 only. This also needs to be updated
#define LOSSLESS_PGS "3-4"
string buffer_pg_key = port + m_cfgBufferPgTable.getTableNameSeparator() + LOSSLESS_PGS;

"5": "0",
"6": "0",
"7": "7"
}
```

* Table for encap

TC_TO_QUEUE_MAP for remapping queue

```json
"TC_TO_QUEUE_MAP": {
"AZURE_TUNNEL": {
"0": "0",
"1": "1",
"2": "1",
"3": "2",
"4": "6",
"5": "5",
"6": "1",
"7": "7"
}
```

TC_TO_DSCP_MAP for rewriting DSCP

```json
"TC_TO_DSCP_MAP": {
bingwang-ms marked this conversation as resolved.
Show resolved Hide resolved
"AZURE_TUNNEL": {
"0": "8",
"1": "0",
"2": "0",
"3": "2",
"4": "6",
"5": "46",
"6": "0",
"7": "48"
}
```

To support the new table, a new YANG model `sonic-tc-dscp.yang` is required
#### 5.1.2 Update existing TUNNEL table
1. Change `dscp_mode` from `uniform` to `pipe` for TC remapping
2. Add TC remapping config if TC remapping is enabled

```json
"TUNNEL": {
"MuxTunnel0": {
"dscp_mode": "pipe",
"dst_ip": "10.1.0.32",
"ecn_mode": "copy_from_outer",
"encap_ecn_mode": "standard",
"ttl_mode": "pipe",
"tunnel_type": "IPINIP",
"decap_dscp_to_tc_map": "[DSCP_TO_TC_MAP|AZURE_TUNNEL]",
bingwang-ms marked this conversation as resolved.
Show resolved Hide resolved
"decap_tc_to_pg_map": "[TC_TO_PRIORITY_GROUP_MAP|AZURE_TUNNEL]",
bingwang-ms marked this conversation as resolved.
Show resolved Hide resolved
"encap_tc_to_queue_map": "[TC_TO_QUEUE_MAP|AZURE_TUNNEL]",
"encap_tc_color_to_dscp_map": "[TC_TO_DSCP_MAP|AZURE_TUNNEL]"
}
}
```

#### 5.1.3 Define new field for extra lossless queues
Two new fields are added to specify software or hardward PFC watchdog.

* `pfc_wd_sw_enable` Specify the queue(s) to enable software PFC watchdog
* `pfc_wd_hw_enable` Specify the queue(s) to enable hardware PFC watchdog.
bingwang-ms marked this conversation as resolved.
Show resolved Hide resolved
neethajohn marked this conversation as resolved.
Show resolved Hide resolved

In current version, software PFC watchdog will read field `pfc_enable` to determine PFCWD is enabled on which queue(s). To maintain compatible with current logic, `db_migrator` script is required to be updated.

```json
"PORT_QOS_MAP": {
"Ethernet0": {
"dscp_to_tc_map": "[DSCP_TO_TC_MAP|AZURE]",
"pfc_enable": "3,4,2,6",
"pfc_wd_sw_enable": "3,4",
"pfc_wd_hw_enable": "2,6",
neethajohn marked this conversation as resolved.
Show resolved Hide resolved
"pfc_to_queue_map": "[MAP_PFC_PRIORITY_TO_QUEUE|AZURE]",
"tc_to_pg_map": "[TC_TO_PRIORITY_GROUP_MAP|AZURE]",
"tc_to_queue_map": "[TC_TO_QUEUE_MAP|AZURE]"
}
}
```

To support new field `pfc_wd_sw_enable` and `pfc_wd_hw_enable`, [sonic-port-qos-map.yang](https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/yang-models/sonic-port-qos-map.yang) is required to be updated.


### 5.2 SAI attribute
TC remapping requires below SAI attributes change.
bingwang-ms marked this conversation as resolved.
Show resolved Hide resolved
```cpp
/**
* @brief Enable TC AND COLOR -> DSCP MAP on tunnel at encapsulation (access-to-network) node to remark the DSCP in tunnel header
*/
SAI_TUNNEL_ATTR_ENCAP_QOS_TC_AND_COLOR_TO_DSCP_MAP,

/**
* @brief Enable TC -> Queue MAP on tunnel encap
*/
SAI_TUNNEL_ATTR_ENCAP_QOS_TC_TO_QUEUE_MAP,

/**
* @brief Enable DSCP -> TC MAP on tunnel at termination (Network-to-access) node. This map if configured overrides the port MAP
*/
SAI_TUNNEL_ATTR_DECAP_QOS_DSCP_TO_TC_MAP,

/**
* @brief Enable TC -> Priority Group MAP. TC is derived from the tunnel MAP
*/
SAI_TUNNEL_ATTR_DECAP_QOS_TC_TO_PRIORITY_GROUP_MAP,
```
For instance, when we get a traffic flow with DSCP = 3 on T1, the traffic and bounced back traffic is delivered and remapped as below:

1. Traffic from `T1` to `Standby ToR`
- Traffic mapped to `TC3`, `PG3` and `Queue 3` by port level QoS mapping
2. Bounced back traffic from `Standby ToR` to `T1`
- Traffic arrived at `Standby ToR` in `TC3`, `PG3` and `Queue 3` as per port level QoS mapping
neethajohn marked this conversation as resolved.
Show resolved Hide resolved
- Packet will be encapped and delivered back to `T1` by `MuxTunnel`
- The outer `DSCP` is rewritten to `2` as specified in `TC_TO_DSCP_MAP|AZURE_TUNNEL` by SAI attribute `SAI_TUNNEL_ATTR_ENCAP_QOS_TC_AND_COLOR_TO_DSCP_MAP`.
- Traffic is delivered in `Queue 2` as specified in `TC_TO_QUEUE_MAP|AZURE_TUNNEL` by SAI attribute `SAI_TUNNEL_ATTR_ENCAP_QOS_TC_TO_QUEUE_MAP`
bingwang-ms marked this conversation as resolved.
Show resolved Hide resolved
3. Bounced back traffic from `T1` to `Active ToR`
- Bounced back traffic arrive at `T1` in `Queue 2` and `PG2` by port level QoS mapping
- Bounced back traffic will be routed to `Active ToR`
4. Traffic from `Active ToR` to `Server`
- Traffic arrived at `Active ToR` and will be decapped and delivered to server
- The outer `DSCP` is ignored as the `dscp_mode` for `MuxTunnel` is `PIPE`. The inner `DSCP3` is copied to outer layer
bingwang-ms marked this conversation as resolved.
Show resolved Hide resolved
- Traffic is remapped to `TC 3` as specified in `DSCP_TO_TC_MAP|AZURE_TUNNEL` by SAI attribute `SAI_TUNNEL_ATTR_DECAP_QOS_DSCP_TO_TC_MAP`
- Traffic is remapped to `PG 2` as specified in `TC_TO_PRIORITY_GROUP_MAP|AZURE_TUNNEL` by SAI attribute `SAI_TUNNEL_ATTR_DECAP_QOS_TC_TO_PRIORITY_GROUP_MAP`
- Traffic is in `Queue 3` as per port level QoS mapping
- Decapped traffic is delivered to target server
### 5.3 orchagent

Code change in orchagent
neethajohn marked this conversation as resolved.
Show resolved Hide resolved

1. Update `tunneldecaporch` to read and set new tunnel attributes when creating decap tunnel.

| Attribute | Value |
|---|-----------|
| SAI_TUNNEL_ATTR_ENCAP_QOS_TC_AND_COLOR_TO_DSCP_MAP | [TC_TO_DSCP_MAP\|AZURE_TUNNEL]|
bingwang-ms marked this conversation as resolved.
Show resolved Hide resolved
| SAI_TUNNEL_ATTR_ENCAP_QOS_TC_TO_QUEUE_MAP | [TC_TO_QUEUE_MAP\|AZURE_TUNNEL] |

2. Update `create_tunnel` defined in `muxorch.cpp` to read and set new tunnel attributes when creating tunnel.
| Attribute | Value |
|---|-----------|
| SAI_TUNNEL_ATTR_DECAP_QOS_DSCP_TO_TC_MAP | [DSCP_TO_TC_MAP\|AZURE_TUNNEL]|
| SAI_TUNNEL_ATTR_DECAP_QOS_TC_TO_PRIORITY_GROUP_MAP | [TC_TO_PRIORITY_GROUP_MAP\|AZURE_TUNNEL |
## 6 Test requirement
All changes are to be covered by system test.
* Encap at standby side

* Test case 1 Verify DSCP re-writing
* test case 2 Verify traffic is egressed at expected queue
* Test case 3 Verify PFC frame generation at expected queue

* Decap at active side

* Test case 1 Verify packets egressed to server at expected queue
* Test case 2 Verify PFC pause frame block expected queue
* Test case 3 Verify PFC frame generation at expected queue


## 7 Open Questions