Skip to content

Commit

Permalink
Merge pull request sonic-net#232 from BRCM-SONIC/link_flap_err_disabl…
Browse files Browse the repository at this point in the history
…e_updates

Link flap err disable updates
  • Loading branch information
Prasanth-KV committed Jun 25, 2021
2 parents a16dea7 + e718757 commit 5b037ec
Show file tree
Hide file tree
Showing 2 changed files with 114 additions and 24 deletions.
83 changes: 76 additions & 7 deletions system/Interface_Down_Reason.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@
|:---:|:-----------:|:------------------:|-----------------------------------|
| 0.1 | 04/05/2021 | Prasanth K V | Initial version |
| 0.2 | 05/17/2021 | Madhukar K | Modified portchannel content |
| 0.3 | 06/22/2021 | Prasanth K V | Added REST details and DB schema |

# About this Manual
This document provides comprehensive functional and design information about the *Interface Down Reason* feature implementation in SONiC.
Expand All @@ -68,7 +69,7 @@ This document provides comprehensive functional and design information about the
### Table 1: Abbreviations
| **Term** | **Meaning** |
|--------------------------|-------------------------------------|
| PCS | Physical Coding Sub-layer |
| PMD | Physical Medium Dependent |
| LACP | Link Aggregation Control Protocol |

# 1 Feature Overview
Expand Down Expand Up @@ -101,8 +102,8 @@ So an interface flap affects the system in general and hence it is important to
- Transceiver not present
- Port breakout in-progress
- High BER
- PCS AM lock error
- PCS sync error
- PMD-CDR-lock
- PMD-signal-detected
- STP error disabled
- Transceiver error disabled
- UDLD error disabled
Expand Down Expand Up @@ -171,6 +172,26 @@ SAI specification has to be updated to get the events from SAI to upper layer.

### 3.2.1 CONFIG DB
### 3.2.2 APP DB
A new field, reason, is been added to PORT_TABLE:
```
"PORT_TABLE": {
"Ethernet40": {
...
"reason": "OPER_UP",
...
}
}
```
A new table is added for keeping track of the events IF_REASON_EVENT:
```
"IF_REASON_EVENT": {
"Ethernet40": {
"reason": "OPER_UP",
"event": "PHY_link_up",
"timestamp": "2021-06-06 09:29:55.639018"
}
}
```
### 3.2.3 STATE DB
### 3.2.4 ASIC DB
### 3.2.5 COUNTER DB
Expand Down Expand Up @@ -214,7 +235,7 @@ Name Description Oper Reason Speed MTU Alternate Name
----------------------------------------------------------------------------------
Eth1/1 - Down Admin-down 100000 9100 Ethernet0
Eth1/2/1 - Down Err-disabled 10000 9100 Ethernet4
Eth1/2/2 - Down Phy-link-down 10000 9100 Ethernet5
Eth1/2/2 - Down PHY-link-down 10000 9100 Ethernet5
Eth1/2/3 - Up Link-up 10000 9100 Ethernet6
```
- *show interface status <reason>*
Expand All @@ -225,13 +246,13 @@ sonic# show interface status err-disabled
----------------------------------------------------------------------------------
Name Event Timestamp
----------------------------------------------------------------------------------
Eth1/2/1 STP-down 2021-04-16 10:23:29
Eth1/2/1 STP-err-disabled 2021-04-16 10:23:29
```
- *show interface <interface>*
show interface command to display the down reasons as shown in the below example:
```
sonic# show interface Eth 1/2/2
Eth1/2/2 is up, line protocol is down, reason phy-link-down
Eth1/2/2 is up, line protocol is down, reason PHY-link-down
Remote-fault at 2021-01-06 07:49:45.737024
Local-fault at 2021-01-06 07:49:45.737024
Hardware is Eth
Expand All @@ -253,6 +274,29 @@ Output statistics:
6 Multicasts, 0 Broadcasts, 0 Unicast
```

The list of events:
Admin-down
Remote-fault
Local-fault
Link-training-failed
Link-training-not-completed
Link-training-not-started
Link-tuning-failed
Link-tuning-not-started
Link-tuning-not-completed
Incompatible-transceiver
Transceiver-not-present
Port-breakout-in-progress
High-BER
PMD-CDR-lock
PMD-signal-detected
STP-err-disabled
Transceiver-err-disabled
UDLD-err-disabled
Link-flap-err-disabled
PHY-link-up


#### Port channel interface
- *show interface status*
Along with the physical interfaces, configured portchannel interfaces are displayed in this command output. The new column, "Reason" displays the high level reason for portchannel down. The reasons are
Expand Down Expand Up @@ -313,7 +357,32 @@ Output statistics:
#### 3.6.2.3 Exec Commands

### 3.6.3 REST API Support
*URL-based view*

GET /restconf/data/openconfig-interfaces:interfaces/interface={name}/openconfig-if-ethernet:ethernet/state/openconfig-interfaces-ext:status/down-reason

Example response data:
{
"openconfig-interfaces-ext:down-reason": "OPER_UP"
}


GET /restconf/data/openconfig-interfaces:interfaces/interface={name}/openconfig-if-ethernet:ethernet/state/openconfig-interfaces-ext:reason-events

Example response data:
{
"openconfig-interfaces-ext:reason-events": {
"down-reason-event": [
{
"reason-event": {
"reason": "OPER_UP",
"event": "PHY-link-up",
"timestamp": "2021-06-06 09:29:55.639018"
}
}
]
}
}


### 3.6.4 gNMI Support
*Generally this is covered by the YANG specification. This section should also cover objects where on-change and interval based telemetry subscriptions can be configured.*
Expand Down
55 changes: 38 additions & 17 deletions system/intf-dampening-HLD.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,37 +38,37 @@ The Port Link Flap Error Disable feature uses an exponential decay mechanism to

When Port Link Flap Error Disable is enabled, the system monitors the number of times a port link state toggles from "up to down", and not from "down to up".

The sampling time or window (the time during which the specified toggle threshold can occur before the wait period is activated) is triggered when the first "up to down" transition occurs.
The sampling interval or window (the time during which the specified toggle threshold can occur before the recovery wait period is activated) is triggered when the first "up to down" transition occurs.

If the port link state toggles from up to down for a specified number of times within a specified period, the interface is physically disabled for the specified wait period. Once the wait period expires, the port link state is re-enabled. However, if the wait period is set to zero (0) seconds, the port link state will remain disabled until it is manually disabled and re-enabled or Port Link Flap Error Disable is disabled on this port.
If the port link state toggles from up to down for a specified number of times within a specified period, the interface is physically disabled for the specified recovery wait period. Once the recovery wait period expires, the port link state is re-enabled. However, if the recovery wait period is set to zero (0) seconds, the port link state will remain disabled until it is manually disabled and re-enabled or Port Link Flap Error Disable is disabled on this port.


## 1.1 Requirements
System shall be able to suppress interfaces state change events to protect system resources.
User shall be able to enable or disable the feature on individual interfaces and globally.
The feature must be disabled on all interfaces by default.
The feature shall be supported on physical interfaces.
There must be two sets of configuration parameters (sample-interval, recovery-interval, and flap-threshold) a per-interface set and a global set. If both global and per-interface are configured, the per-interface values are used only for given interfaces. Global values are used for all other physical interfaces.
There must be two sets of configuration parameters (sampling-interval, recovery-interval, and flap-threshold) a per-interface set and a global set. If both global and per-interface are configured, the per-interface values are used only for given interfaces. Global values are used for all other physical interfaces.
If no values are specified by user, a default set of parameters are applied to all interfaces.
User shall be able to save configuration parameters (both global and per-interface).
The configuration parameters (both global and per-interface) must be preserved across device reboot.

### 1.1.1 Functional Requirements
Port Link Flap Error Disable shall use below parameters to supress and protect system.
- flap-threshold
Specifies the number of times a port link state goes from up to down before the wait period is activated. The value ranges from 1 through 50.
- sample-interval
Specifies the amount of time, in seconds, during which the specified toggle threshold can occur before the wait period is activated. The value ranges from 1 through 65535.
Specifies the number of times a port link state goes from up to down before the recovery wait period is activated. The value ranges from 1 through 50.
- sampling-interval
Specifies the amount of time, in seconds, during which the specified toggle threshold can occur before the recovery wait period is activated. The value ranges from 1 through 65535.
- recovery-interval
Specifies the amount of time in seconds, for which the port remains disabled (down) before it becomes enabled. The value ranges from 0 through 65534. A value of 0 indicates that the port will stay down until an administrative override occurs.

### 1.1.2 Configuration and Management Requirements
- Port Link Flap Error Disable feature default is OFF on all physical interfaces and port-channels
- When Port Link Flap Error Disable is enabled, use below default values:
flap-threshold: 3
sample-interval: 10
sampling-interval: 10
recovery-interval: 300
- User shall be able to specify different sample-interval, flap-threshold and recovery-interval on a physical interface
- User shall be able to specify different sampling-interval, flap-threshold and recovery-interval on a physical interface
- User shall be able to display current Port Link Flap Error Disable confiuration values.
- User shall be able to display current interface status if it was surpresed by Port Link Flap Error Disable
- User shall be able to display Link-Down-Reason if a port is disabled by Port Link Flap Error Disable feature
Expand Down Expand Up @@ -101,23 +101,37 @@ The Interface Error Disable feature exist in below modules and containers:
- *link-error-disable flap-threshold <flap count> sampling-interval <interval in sec> recovery-interval <recovery interval in sec>*
Example:
```
sonic(conf-if-Ethernet0)# link-error-disable flap-threshold 10 sampling-time 3 recovery-timeout 10
sonic(conf-if-Ethernet0)# link-error-disable flap-threshold 10 sampling-interval 3 recovery-interval 10
```
In this example, the values for the parameters are as follows:

The flap-threshold is set at 10 times. This interval is the number of times that the port's link state goes from up to down and down to up before the recovery-timeout is activated. Enter a valid value range from 1-50. Default is 3.
The flap-threshold is set at 10 times. This interval is the number of times that the port's link state goes from up to down and down to up before the recovery-interval is activated. Enter a valid value range from 1-50. Default is 3.


The sampling-time is set to 3 seconds. This time period is the amount of time during which the specified flap-threshold can be crossed. If the flap-threshold is crossed during this sampling-time, port will be error-disabled. Enter a value between 1 and 65535 seconds. Default is 10.
The sampling-interval is set to 3 seconds. This time period is the amount of time during which the specified flap-threshold can be crossed. If the flap-threshold is crossed during this sampling-interval, port will be error-disabled. Enter a value between 1 and 65535 seconds. Default is 10.


The recovery-timeout is set to 10 seconds. This period of time is the amount of time the port remains disabled (down) before it becomes enabled. Entering 0 indicates that the port will stay down until an administrative override occurs. Enter a value between 0 and 65534 seconds. Default is 300.
The recovery-interval is set to 10 seconds. This period of time is the amount of time the port remains disabled (down) before it becomes enabled. Entering 0 indicates that the port will stay down until an administrative override occurs. Enter a value between 0 and 65534 seconds. Default is 300.


This config command can be executed on a range of interfaces as well. Example:
```
sonic(conf-if-range-eth**)# link-error-disable flap-threshold 10 sampling-time 3 recovery-timeout 10
sonic(conf-if-range-eth**)# link-error-disable flap-threshold 10 sampling-interval 3 recovery-interval 10
```

The following command is used to enable the link-error-disable with default values for flap-threhsold, sampling-interval and recovery-interval:

```
sonic(conf-if-Ethernet0)#link-error-disable
```

This command to enable link-error-disable with default parameters is supported for the range of interfaces as well, as shown in the below example:

```
sonic(conf-if-range-eth**)# link-error-disable
```


Example for disabling link-flap error-disable on a port:
```
sonic(conf-if-Ethernet0)#no link-error-disable
Expand Down Expand Up @@ -165,12 +179,19 @@ The ports which does not have non-default error disable configurations will not
Example:
```
sonic#show errdisable link-flap
Interface Flap-threshold Sampling-time Recovery-timeout Status
Interface Flap-threshold Sampling-interval Recovery-interval Status
---------------------------------------------------------------------------
Ethernet0 10 3 30 Errdisabled
Ethernet4 10 3 60 Not-errdisabled
Ethernet8 5 10 300 Off
Ethernet0 10 3 30 Errdisabled
Ethernet4 10 3 60 Not-errdisabled
Ethernet8 5 10 300 Off
```

The possible status values are
1. Errdisabled: The number of link flaps in a sampling interval crossed the threshold and port is currently in err-disabled state.
2. Not-errdisabled: The err-disable is enabled, but number of flaps in sampling intervals did not cross the configured threshold.
3. Off: The err-disable parameters are configured but it is not enabled.
4. On: The err-disable is enabled, and no link flaps since then.

# 2.2 Functional Description

# 3 Design
Expand Down

0 comments on commit 5b037ec

Please sign in to comment.