-
Notifications
You must be signed in to change notification settings - Fork 768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition when disposing of channels while cancelling calls. #2119
Labels
bug
Something isn't working
Comments
amanda-tarafa
added a commit
to amanda-tarafa/grpc-dotnet
that referenced
this issue
May 12, 2023
amanda-tarafa
added a commit
to amanda-tarafa/grpc-dotnet
that referenced
this issue
May 12, 2023
amanda-tarafa
added a commit
to amanda-tarafa/grpc-dotnet
that referenced
this issue
May 12, 2023
@JamesNK we were about to release a patched beta of Google.Cloud.PubSub.V1 . But if the release of this is more or less inminint, we'd probably want to wait. Do you have a rough idea on when this will be released? Thanks! |
I'm going to start the 2.54.0 release process, but it will take a couple of weeks to have a preview release and then a final release. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I think there might be a race condition if a channel is disposed at the same time that one of its active calls is cancelled separately. I think this happens only for server or bidi streaming calls. The original report (googleapis/google-cloud-dotnet#10318) comes from a Google.Cloud.PubSub.V1 user who has provided a repro and logs. A couple of us on the Google side haven't been able to reproduce but I think my description below of the race conditions is accurate, and it's also supported by the user provided logs (ignoring the Google.Cloud.PubSub.V1 part of the logs.)
Grpc.Net.Client.GrpcChannel.Dispose disposes of active calls within a lock.
grpc-dotnet/src/Grpc.Net.Client/GrpcChannel.cs
Lines 741 to 754 in 38738af
Grpc.Net.Client.GrpcCall.Dispose basically calls GrpcCall.Cleanup with a status of cancel.
GrpcCall.Cleanup then sets the call TaskCompletionSource to cancelled.
grpc-dotnet/src/Grpc.Net.Client/Internal/GrpcCall.cs
Line 211 in 38738af
But, if this GrpcCall was already being cancelled elsewhere (i.e. in a different thread) setting the TaskCompletionSource to cancelled blocks until the cancellation callbacks have completed elsewhere. Note that this blocks holding the chanell lock.
In the meantime the thread that first cancelled the GrpcCall has been able to:
grpc-dotnet/src/Grpc.Net.Client/Internal/GrpcCall.cs
Lines 73 to 74 in 38738af
grpc-dotnet/src/Grpc.Net.Client/Internal/GrpcCall.cs
Line 621 in 38738af
grpc-dotnet/src/Grpc.Net.Client/Internal/GrpcCall.cs
Line 639 in 38738af
which in turn calls GrpcChannel.FinishActiveCall
grpc-dotnet/src/Grpc.Net.Client/Internal/GrpcCall.cs
Line 218 in 38738af
which requires the channel lock that is being held by the channel disposing thread, which is waiting for this thread to finish executing the cancellation callback:
grpc-dotnet/src/Grpc.Net.Client/GrpcChannel.cs
Lines 490 to 501 in 38738af
Possible solution:
In GrpcChannel.Dispose, release the lock inmediately after getting the local copy of the active calls collection. This will avoid the race condition and as far as I can see will introduce no side effect.
The text was updated successfully, but these errors were encountered: