Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add transient fault handling doc for gRPC retries #21621

Merged
merged 19 commits into from
Mar 9, 2021
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion aspnetcore/grpc/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,14 @@ gRPC client configuration is set on `GrpcChannelOptions`. The following table de
| DisposeHttpClient | `false` | If set to `true` and an `HttpMessageHandler` or `HttpClient` is specified, then either the `HttpHandler` or `HttpClient`, respectively, is disposed when the `GrpcChannel` is disposed. |
| LoggerFactory | `null` | The `LoggerFactory` used by the client to log information about gRPC calls. A `LoggerFactory` instance can be resolved from dependency injection or created using `LoggerFactory.Create`. For examples of configuring logging, see <xref:grpc/diagnostics#grpc-client-logging>. |
| MaxSendMessageSize | `null` | The maximum message size in bytes that can be sent from the client. Attempting to send a message that exceeds the configured maximum message size results in an exception. When set to `null`, the message size is unlimited. |
| <span style="word-break:normal;word-wrap:normal">MaxReceiveMessageSize</span> | 4 MB | The maximum message size in bytes that can be received by the client. If the client receives a message that exceeds this limit, it throws an exception. Increasing this value allows the client to receive larger messages, but can negatively impact memory consumption. When set to `null`, the message size is unlimited. |
| MaxReceiveMessageSize | 4 MB | The maximum message size in bytes that can be received by the client. If the client receives a message that exceeds this limit, it throws an exception. Increasing this value allows the client to receive larger messages, but can negatively impact memory consumption. When set to `null`, the message size is unlimited. |
| Credentials | `null` | A `ChannelCredentials` instance. Credentials are used to add authentication metadata to gRPC calls. |
| CompressionProviders | gzip | A collection of compression providers used to compress and decompress messages. Custom compression providers can be created and added to the collection. The default configured providers support **gzip** compression. |
| ThrowOperationCanceledOnCancellation | `false` | If set to `true`, clients throw <xref:System.OperationCanceledException> when a call is canceled or its deadline is exceeded. |
| MaxRetryAttempts | 5 | The maximum retry attempts. This value limits any retry and hedging attempt values specified in the service config. Setting this value alone doesn't enable retries. Retries are enabled in the service config, which can be done using `ServiceConfig`. A `null` value removes the maximum retry attempts limit. For more information about retries, see <xref:grpc/retries>. |
| MaxRetryBufferSize | 16 MB | The maximum buffer size in bytes that can be used to store sent messages when retrying or hedging calls. If the buffer limit is exceeded then no more retry attempts are made and all hedging calls but one will be canceled. This limit is applied across all calls made using the channel. A `null` value removes the maximum retry buffer size limit. |
JamesNK marked this conversation as resolved.
Show resolved Hide resolved
| <span style="word-break:normal;word-wrap:normal">MaxRetryBufferPerCallSize</span> | 1 MB | The maximum buffer size in bytes that can be used to store sent messages when retrying or hedging calls. If the buffer limit is exceeded then no more retry attempts are made and all hedging calls but one will be canceled. This limit is applied to one call. A `null` value removes the maximum retry buffer size limit per call. |
JamesNK marked this conversation as resolved.
Show resolved Hide resolved
| ServiceConfig | `null` | The service config for a gRPC channel. A service config can be used to configure [gRPC retries](xref:grpc/retries). |

The following code:

Expand Down
140 changes: 140 additions & 0 deletions aspnetcore/grpc/retries.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
---
title: Resilient gRPC calls with retries
author: jamesnk
description: Learn how to make resilient gRPC calls with retries in .NET.
monikerRange: '>= aspnetcore-3.0'
ms.author: jamesnk
ms.date: 02/25/2021
no-loc: [appsettings.json, "ASP.NET Core Identity", cookie, Cookie, Blazor, "Blazor Server", "Blazor WebAssembly", "Identity", "Let's Encrypt", Razor, SignalR]
uid: grpc/retries
---
# Resilient gRPC calls with retries

By [James Newton-King](https://twitter.com/jamesnk)

gRPC retries is a feature that allows gRPC clients to automatically retry failed calls. This article discusses how to configure a retry policy to make resilient, fault tolerant gRPC apps in .NET.

## Transient fault handling

gRPC calls can be interrupted by transient faults. Transient faults include:

* Momentary loss of network connectivity.
* Temporary unavailability of a service.
* Timeouts due to server load.

When a gRPC call is interrupted the client will throw an `RpcException` with details about the error. The client app must catch the exception and choose how to handle the error.
JamesNK marked this conversation as resolved.
Show resolved Hide resolved
JamesNK marked this conversation as resolved.
Show resolved Hide resolved

```csharp
var client = new Greeter.GreeterClient(channel);
try
{
var response = await client.SayHelloAsync(
new HelloRequest { Name = ".NET" });

Console.WriteLine("From server: " + response.Message);
}
catch (RpcException ex)
{
// Write logic to inspect the error and retry
wadepickett marked this conversation as resolved.
Show resolved Hide resolved
// if the error is from a transient fault.
wadepickett marked this conversation as resolved.
Show resolved Hide resolved
}
```

Duplicating retry logic throughout an app is verbose and error prone. Fortunately the .NET gRPC client has a built-in support for automatic retries.

## Configure a gRPC retry policy

A retry policy is configured once when a gRPC channel is created:

```csharp
var defaultMethodConfig = new MethodConfig
{
Names = { MethodName.Default },
RetryPolicy = new RetryPolicy
{
MaxAttempts = 5,
InitialBackoff = TimeSpan.FromSeconds(1),
MaxBackoff = TimeSpan.FromSeconds(5),
BackoffMultiplier = 1.5,
RetryableStatusCodes = { StatusCode.Unavailable }
}
};

var channel = GrpcChannel.ForAddress("https://localhost:5001", new GrpcChannelOptions
{
ServiceConfig = new ServiceConfig { MethodConfigs = { defaultMethodConfig } }
});
```

The preceding code:

* Creates a `MethodConfig`. Retry policies can be configured per-method and methods are matched using the `Names` property. This method is configured with `MethodName.Default`, so it's applied to all gRPC methods called by this channel.
* Configures a retry policy. This policy instructs clients to automatically retry gRPC calls that fail with the status code `Unavailable`.
* Configures the created channel to use the retry policy by setting `GrpcChannelOptions.ServiceConfig`.

gRPC clients created with the channel will automatically retry failed calls:

```csharp
var client = new Greeter.GreeterClient(channel);
var response = await client.SayHelloAsync(
new HelloRequest { Name = ".NET" });

Console.WriteLine("From server: " + response.Message);
```

### gRPC retry options

The following table describes options for configuring gRPC retry policies:

| Option | Description |
| ------ | ----------- |
| `MaxAttempts` | The maximum number of call attempts, including the original attempt. This value is limited by `GrpcChannelOptions.MaxRetryAttempts` which defaults to 5. A value is required and must be greater than 1. |
| `InitialBackoff` | The initial backoff delay between retry attempts. A randomized delay between 0 and the current backoff determines when the next retry attempt is made. After each attempt, the current backoff is multiplied by `BackoffMultiplier`. A value is required and must be greater than zero. |
| `MaxBackoff` | The maximum backoff places an upper limit on exponential backoff growth. A value is required and must be greater than zero. |
| `BackoffMultiplier` | The backoff will be multiplied by this value after each retry attempt and will increase exponentially when the multiplier is greater than 1. A value is required and must be greater than 0. |
JamesNK marked this conversation as resolved.
Show resolved Hide resolved
| `RetryableStatusCodes` | A collection of status codes. A gRPC call that fails with a matching status will be automatically retried. For more information about status codes, see [Status codes and their use in gRPC](https://grpc.github.io/grpc/core/md_doc_statuscodes.html). At least one retryable status code is required. |

## Hedging

Hedging is an alternative retry strategy. Hedging enables aggressively sending multiple copies of a single gRPC call without waiting for a response. Hedged gRPC calls may be executed multiple times on the server and the first successful result is used. It's important that hedging is only enabled for methods that are safe to execute multiple times without adverse affect.
JamesNK marked this conversation as resolved.
Show resolved Hide resolved

Hedging has pros and cons when compared to retries:

* An advantage to hedging is it might return a successful result faster. It allows for multiple simultaneously gRPC calls and will complete when the first successful result is available.
* A disadvantage to hedging is it can be wasteful. Multiple calls could be made and all succeed. Only the first result is used and the rest are discarded.

## Configure a gRPC hedging policy

A hedging policy is configured like a retry policy. Note that a hedging policy can't be combined with a retry policy.

```csharp
var defaultMethodConfig = new MethodConfig
{
Names = { MethodName.Default },
HedgingPolicy = new HedgingPolicy
{
MaxAttempts = 5,
NonFatalStatusCodes = { StatusCode.Unavailable }
}
};

var channel = GrpcChannel.ForAddress("https://localhost:5001", new GrpcChannelOptions
{
ServiceConfig = new ServiceConfig { MethodConfigs = { defaultMethodConfig } }
});
```

### gRPC hedging options

The following table describes options for configuring gRPC hedging policies:

| Option | Description |
| ------ | ----------- |
| `MaxAttempts` | The hedging policy will send up to this number of calls. `MaxAttempts` represents the total number of all attempts, including the original attempt. This value is limited by `GrpcChannelOptions.MaxRetryAttempts` which defaults to 5. A value is required and must be greater than 1. |
| `HedgingDelay` | The first call will be sent immediately, but the subsequent hedging calls will be delayed by this value. When the delay is set to zero all hedged calls are sent immediately. Default value is zero. |
JamesNK marked this conversation as resolved.
Show resolved Hide resolved
| `NonFatalStatusCodes` | A collection of status codes which indicate other hedge calls may still succeed. If a non-fatal status code is returned by the server, hedged calls will continue. Otherwise, outstanding requests will be canceled and the error returned to the app. For more information about status codes, see [Status codes and their use in gRPC](https://grpc.github.io/grpc/core/md_doc_statuscodes.html). |

## Additional resources

* <xref:grpc/client>
* [Retry general guidance - Best practices for cloud applications](/azure/architecture/best-practices/transient-faults)
2 changes: 2 additions & 0 deletions aspnetcore/toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -750,6 +750,8 @@
uid: grpc/clientfactory
- name: Deadlines and cancellation
uid: grpc/deadlines-cancellation
- name: Transient fault handling
wadepickett marked this conversation as resolved.
Show resolved Hide resolved
uid: grpc/retries
- name: gRPC services with ASP.NET Core
uid: grpc/aspnetcore
- name: Supported platforms
Expand Down