Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validator improvements: use a httpx session/client and a custom User-Agent #1188

Closed
wants to merge 2 commits into from

Conversation

ml-evs
Copy link
Member

@ml-evs ml-evs commented May 24, 2022

This PR drops the requests dependency and replaces it with httpx, which will allow async applications in the future (and is used by the upcoming client).

It also:

  1. Uses a httpx.Client (equivalent to a requests.Session) which seems to speed-up validation by 10-20% for odbx (will observe build time differences for the dashboard).
  2. Defines a custom User-Agent for the validator that can be used by databases to prioritise/drop requests (closes The validator should use a custom User-Agent header #1187).

@ml-evs ml-evs added enhancement New feature or request validator Related to the OPTIMADE validator labels May 24, 2022
@ml-evs ml-evs added this to the v0.18.0 milestone May 24, 2022
@ml-evs
Copy link
Member Author

ml-evs commented May 24, 2022

This PR should probably only be merged once the client is ready, before a 0.18 release.

Copy link
Member

@CasperWA CasperWA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing really to put my finger on, just a lot of surrounding questions and comments.

@@ -13,6 +13,7 @@
"""

import time

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? I think it might be nice to use isort as a pre-commit hook? Then all import statements will be unified for all :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is obviously just a typo, but yeah, we should probably start using isort. Will remove this diff

Suggested change

"""Close the underlying httpx client, if it is still open."""
try:
self.http_client.close()
except Exception:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there an explicit exception for this to use instead, so that unexpected exceptions are not missed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was in two minds here really, the exceptions raised by close() are not documented so I don't think we can necessarily rely on them. The method already has handling for closing already closed connections (it just skips) so I think this is safe enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@CasperWA CasperWA May 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this seems more like you should simply call self.http_client.close() and that's that? No try/except. If an exception is raised it's something to be concerned about, if not, then it's already closed or it was successfully closed.

optimade/validator/utils.py Show resolved Hide resolved
sys.exit(
f"Unable to make request on {self.last_request}, did you mean http://{self.last_request}?"
f"Unable to make request on {self.last_request!r}, did you mean 'https://{self.last_request}'?\nFull error: {exc}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running the validator, should we have an outer try/except (perhaps for a custom catch-'em-all ValidatorException or similar?) to always add this extra Full error part instead? Then we can just raise with an appropriate error message here instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have this with InternalError being "anything unexpected", but this is the one case where we want to end all further requests as the URL will remain malformed for the duration. This is unrelated to this PR really

Copy link
Member

@CasperWA CasperWA May 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated to this PR really

Sure - the majority of these comments are...

We already have this with InternalError being "anything unexpected", but this is the one case where we want to end all further requests as the URL will remain malformed for the duration.

Okay - I thought it would always simply stop if something is raised, but I guess the catch-'em-all try/except simply collects exceptions then, instead of stopping? Which is definitely the way it should work in general.

Maybe then instead we could provide the exception with an agreed extra property, like hard_fail or force_fail or end, which the "outer" try/except checks before then either stopping or collecting and moving on?
And maybe this is a moot point. I'm just generally disliking using sys.exit() in a place that doesn't also directly handle terminal stuff. In a class like this it's weird - especially if testing this code through unit testing, etc.

finally:
try:
self.client.close()
except Exception:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, maybe a specific exception here? I understand you at all costs want to close it, but passing on an all exceptions won't necessarily do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

Also in this case self.client can be a custom class passed by the user (we use the starlette test client in our tests) so I think we are forced to keep it general.

@ml-evs
Copy link
Member Author

ml-evs commented May 24, 2022

I've been trying to unpick the requests dependency in testing, but its proving quite tricky... the underlying starlette TestClient uses requests, and a lot of our testing mechanisms rely on mocking requests.get (difficult to get around when using a session).

I'm going to pick out the user-agent commit and make a new PR, will leave the rest of this on hold for now, perhaps until starlette support httpx (same authors) - see encode/starlette#652

@ml-evs ml-evs closed this May 24, 2022
@ml-evs ml-evs added the on-hold For PRs/issues that are on-hold for an unspecified time label May 24, 2022
@CasperWA
Copy link
Member

I've been trying to unpick the requests dependency in testing, but its proving quite tricky... the underlying starlette TestClient uses requests, and a lot of our testing mechanisms rely on mocking requests.get (difficult to get around when using a session).

I'm going to pick out the user-agent commit and make a new PR, will leave the rest of this on hold for now, perhaps until starlette support httpx (same authors) - see encode/starlette#652

Either that, or you can check out the setup I have for OPTIMADE Gateway (https://github.com/Materials-Consortia/optimade-gateway/blob/main/tests/conftest.py#L90, as well as the rest of the test files).

@ml-evs
Copy link
Member Author

ml-evs commented May 24, 2022

Either that, or you can check out the setup I have for OPTIMADE Gateway (https://github.com/Materials-Consortia/optimade-gateway/blob/main/tests/conftest.py#L90, as well as the rest of the test files).

One for another day ;)

@ml-evs
Copy link
Member Author

ml-evs commented May 24, 2022

I will have to refactor some of the test client in #1154 anyway, so maybe I'll find the motivation to integrate this then

@ml-evs ml-evs deleted the ml-evs/use_httpx_everywhere branch December 6, 2022 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request on-hold For PRs/issues that are on-hold for an unspecified time validator Related to the OPTIMADE validator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The validator should use a custom User-Agent header
2 participants