Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema Registry and REST Proxy: health checks #7080

Merged
merged 2 commits into from
Dec 27, 2022

Conversation

BenPope
Copy link
Member

@BenPope BenPope commented Nov 3, 2022

Cover letter

Add health checks available at:

  • :8081/status/ready
  • :8082/status/ready

Return 200 on success and 503 on failure. These check that there is a connection between the service and Redpanda.

These are designed to replace the corresponding:

  • :8081/subjects
  • :8082/brokers

That are recommended today. They can be improved upon at a later date.

Backport Required

Lets backport; it's simple and a vast improvement on:

  • :8081/subjects
  • :8082/topics
  • not a bug fix
  • issue does not exist in previous branches
  • papercut/not impactful enough to backport
  • v22.2.x
  • v22.1.x
  • v21.11.x

UX changes

Health checks are available at /status/ready and will return 200 or 503 status code depending on whether REST Proxy or Schema Registry is able to connect to Redpanda.

A good use case is to inform a loadbalancer for routes that don't require to be routed to a specific instance of the service (consumer endpoints on REST Proxy).

It is not a good idea to use these as a Kubernetes probe, or other signal that the service is down, as that usually results in the orchestrator killing Redpanda, which is undesirable.

Release notes

Features

  • REST Proxy: A health check is now available at :8082/status/ready
  • Schema Registry: A health check is now available at :8081/status/ready

@BenPope BenPope changed the title schema_registry and REST Proxy: healtch checkshealth checks Schema Registry and REST Proxy: health checks Nov 3, 2022
NyaliaLui
NyaliaLui previously approved these changes Nov 3, 2022
Copy link
Contributor

@NyaliaLui NyaliaLui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM when CI is green.

src/v/pandaproxy/api/api-doc/rest.json Show resolved Hide resolved
Comment on lines 150 to 161
if (ex.error == error_code::network_exception) {
vlog(kclog.warn, "broker_error: {}", ex);
return ss::make_exception_future(ex);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have to have another think about this. The issue may have been introduced in 3ca8861

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For posterity, the fix is here: #7735

@BenPope
Copy link
Member Author

BenPope commented Dec 22, 2022

force-push was a clean rebase

Signed-off-by: Ben Pope <ben@redpanda.com>
Signed-off-by: Ben Pope <ben@redpanda.com>
@BenPope
Copy link
Member Author

BenPope commented Dec 22, 2022

force-push drops the first commit, as it's now handled in #7735

@NyaliaLui
Copy link
Contributor

IS this ready?

@BenPope
Copy link
Member Author

BenPope commented Dec 22, 2022

IS this ready?

200

@BenPope BenPope self-assigned this Dec 22, 2022
@BenPope BenPope added area/schema-registry Schema Registry service within Redpanda area/pandaproxy REST interface for Kafka API labels Dec 22, 2022
@BenPope
Copy link
Member Author

BenPope commented Dec 22, 2022

Known and unrelated CI failures:

--- FAIL: kuttl/harness (0.00s)
--
  | --- FAIL: kuttl/harness/centralized-configuration (300.56s)
  | --- FAIL: kuttl/harness/update-conf-image (300.65s)
  | --- FAIL: kuttl/harness/decommission (302.01s)
  | FAIL

@RafalKorepta
Copy link
Contributor

Known and unrelated CI failures:

--- FAIL: kuttl/harness (0.00s)
--
  | --- FAIL: kuttl/harness/centralized-configuration (300.56s)
  | --- FAIL: kuttl/harness/update-conf-image (300.65s)
  | --- FAIL: kuttl/harness/decommission (302.01s)
  | FAIL

The problem with the output of our CI is that at the end of the out put you see the e2e-unstable tests. @ivotron configured whole k8s pipeline to run stable tests, save the exit code, run unstable and fail if stable exit code is none 0.

https://github.com/redpanda-data/vtools/pull/551/files#r940104919

The one that fails:

        --- FAIL: kuttl/harness/update-image-and-node-port (633.32s)
        --- FAIL: kuttl/harness/update-image-tls (743.76s)

Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks to me.

did you consider not doing a read/metadata request for each status check and instead keep a little state like: return healthy if "anything that implies healthy" was done recently?

@BenPope
Copy link
Member Author

BenPope commented Dec 23, 2022

There's a lot that could be improved. The goal here was to do basically the minimum possible as a strict improvement over /subjects or /topics and have an official API that we could document, and then improve behind the scenes.

@dotnwat
Copy link
Member

dotnwat commented Dec 27, 2022

/ci-repeat 1

@dotnwat
Copy link
Member

dotnwat commented Dec 27, 2022

Failure is k8s

@dotnwat dotnwat merged commit 4dd1e57 into redpanda-data:dev Dec 27, 2022
@piyushredpanda
Copy link
Contributor

Awesome to get this done. @BenPope: has there been a thread to let our cloudv2/SRE folks know to use this?

@piyushredpanda
Copy link
Contributor

Erm, of course we need backporting first and for it to be released. @BenPope: you driving the backports?

@BenPope
Copy link
Member Author

BenPope commented Jan 3, 2023

/backport v22.3.x

@BenPope
Copy link
Member Author

BenPope commented Jan 3, 2023

/backport v22.2.x

@BenPope
Copy link
Member Author

BenPope commented Jan 3, 2023

/backport v22.1.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/pandaproxy REST interface for Kafka API area/redpanda area/schema-registry Schema Registry service within Redpanda
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants