-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schema Registry and REST Proxy: health checks #7080
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM when CI is green.
src/v/kafka/client/client.cc
Outdated
if (ex.error == error_code::network_exception) { | ||
vlog(kclog.warn, "broker_error: {}", ex); | ||
return ss::make_exception_future(ex); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have to have another think about this. The issue may have been introduced in 3ca8861
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For posterity, the fix is here: #7735
0d2ae38
to
fa4619e
Compare
force-push was a clean rebase |
Signed-off-by: Ben Pope <ben@redpanda.com>
Signed-off-by: Ben Pope <ben@redpanda.com>
fa4619e
to
35f1fd0
Compare
force-push drops the first commit, as it's now handled in #7735 |
IS this ready? |
200 |
Known and unrelated CI failures:
|
The problem with the output of our CI is that at the end of the out put you see the https://github.com/redpanda-data/vtools/pull/551/files#r940104919 The one that fails:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks to me.
did you consider not doing a read/metadata request for each status check and instead keep a little state like: return healthy if "anything that implies healthy" was done recently?
There's a lot that could be improved. The goal here was to do basically the minimum possible as a strict improvement over |
/ci-repeat 1 |
Failure is k8s |
Awesome to get this done. @BenPope: has there been a thread to let our cloudv2/SRE folks know to use this? |
Erm, of course we need backporting first and for it to be released. @BenPope: you driving the backports? |
/backport v22.3.x |
/backport v22.2.x |
/backport v22.1.x |
Cover letter
Add health checks available at:
:8081/status/ready
:8082/status/ready
Return
200
on success and503
on failure. These check that there is a connection between the service and Redpanda.These are designed to replace the corresponding:
:8081/subjects
:8082/brokers
That are recommended today. They can be improved upon at a later date.
Backport Required
Lets backport; it's simple and a vast improvement on:
:8081/subjects
:8082/topics
UX changes
Health checks are available at
/status/ready
and will return200
or503
status code depending on whether REST Proxy or Schema Registry is able to connect to Redpanda.A good use case is to inform a loadbalancer for routes that don't require to be routed to a specific instance of the service (consumer endpoints on REST Proxy).
It is not a good idea to use these as a Kubernetes probe, or other signal that the service is down, as that usually results in the orchestrator killing Redpanda, which is undesirable.
Release notes
Features
:8082/status/ready
:8081/status/ready