-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grpc/outlier: inspect response grpc status code to determine host success rate #860
Comments
cc @louiscryan |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions. |
cc @lizan I don't have time to pick this up but for context there is a semi-canonical mapping from GRPC to HTTP status codes that might be useful https://github.com/grpc/grpc/blob/master/doc/http-grpc-status-mapping.md |
Yes, envoy has implemented a reverse mapping as well. |
As we are working on the fix , it seems that the grpc trailer status needs be handled explicitly (possibly as a new concept) in OutlierDetector Extending the non-HTTP Result enum doesn't seem right either. Any thoughts on this? I'd like to file a new issue to add explicit trailer status support in OutlierDetector. |
@wenbozhu can you describe in more detail what the issue is? Unfortunately the outlier detection code and the way we do mappings is incredibly complicated. cc @cpakulski. |
Currently, grpc status in response headers is not checked in OutlierDetector. We are fixing this behavior with #7942 We believe this is the minimum fix required to make OutlierDetector work for grpc services. However, we also need define how OutlierDetector should handle the trailer status (when it's an error). We believe we should not delay recording the initial status till the trailer is received or the stream is aborted. For monitoring, we want to differentiate between immediate failures and "partial" failures as indicated by an error trailer status. For ejection etc, I think we need more discussion, e.g. whether a new API needs be introduced or we should just treat trailer errors the same way as HTTP error status. Even with the latter approach, there will be extra complexity to handle the extra status for a single request. === Also as I understand, HTTP doesn't allow status code (header) to be overwritten in a trailer, so the trailer status support will be unique to grpc requests. |
Thanks for the detail. I will review #7942 as I agree this is very complicated and it took us a while to get through the last round of changes in this area that @cpakulski was doing.
Agreed we need to think about this carefully, double counting has caused us issues in the past. We may need to split out gRPC into its own thing. Let's discuss in the other PR and I'm hoping @cpakulski can weigh in also. |
Hello, please give me few days and will chime in.
…On Fri, Aug 23, 2019, 16:34 Matt Klein ***@***.***> wrote:
Thanks for the detail. I will review #7942
<#7942> as I agree this is very
complicated and it took us a while to get through the last round of changes
in this area that @cpakulski <https://github.com/cpakulski> was doing.
Even with the latter approach, there will be extra complexity to handle
the extra status for a single request.
Agreed we need to think about this carefully, double counting has caused
us issues in the past. We may need to split out gRPC into its own thing.
Let's discuss in the other PR and I'm hoping @cpakulski
<https://github.com/cpakulski> can weigh in also.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#860?email_source=notifications&email_token=AIWLKDSFTKTG2Z6H3CQMKOTQGBCWTA5CNFSM4DJPT552YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5BIKUY#issuecomment-524453203>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AIWLKDVZVAC7NLFPAGDGVDTQGBCWTANCNFSM4DJPT55Q>
.
|
Fixed in #7942 |
Description: move factory registration out of the engine file. This will eventually allow third party builds to replace the extensions installed by the build system. The directory name and location is not necessarily final; this is just the first step in separating this out to its own target. Risk Level: low Testing: existing CI Signed-off-by: Jose Nino <jnino@lyft.com> Signed-off-by: JP Simard <jp@jpsim.com>
Description: move factory registration out of the engine file. This will eventually allow third party builds to replace the extensions installed by the build system. The directory name and location is not necessarily final; this is just the first step in separating this out to its own target. Risk Level: low Testing: existing CI Signed-off-by: Jose Nino <jnino@lyft.com> Signed-off-by: JP Simard <jp@jpsim.com>
Right now we base success rate outlier detection only on HTTP status codes, which excludes request failures based on grpc status codes.
The router filter should look for grpc status codes and feed a failure to the outlier detector in cases when the HTTP status code is a success.
Related to #721
The text was updated successfully, but these errors were encountered: