Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling blocking code from SecurityIdentityAugmentor causes concurrent requests to be limited by the number of event loop threads #43217

Closed
lorenzobenvenuti opened this issue Sep 11, 2024 · 11 comments · Fixed by #43248
Labels
area/security kind/bug Something isn't working
Milestone

Comments

@lorenzobenvenuti
Copy link

Describe the bug

Hi,

we have an application calling some blocking code from a SecurityIdentityAugmentor (using authenticationRequestContext.runBlocking); under some circumstances, this code may take longer than expected and when this happens we've noticed that requests seemed to be queued. After some investigation we found out that the application can't handle more than N requests where N = number of event loop threads... but, since the blocking tasks run in a worker thread, I was expecting the application to handle up to M requests, where M = number of worker threads.

For example, say the event loop has 2 threads and the blocking task takes 5s: if three users perform a request at the same time, two requests will take 5s and one will take 10s because it won't be processed until the first two complete. However, if an unauthenticated request (i.e. blocking task is not run) is performed while some blocking tasks are running, it's served immediately which shows that the event loop threads are not busy.

You can find a reproducer here: https://github.com/lorenzobenvenuti/security-augmentor-blocking-issue

Thanks,

lorenzo

Expected behavior

Number of concurrent requests is limited by the number of worker threads

Actual behavior

Number of concurrent requests is limited by the number of event loop threads

How to Reproduce?

https://github.com/lorenzobenvenuti/security-augmentor-blocking-issue

Output of uname -a or ver

Linux 6.10.6-100.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Aug 19 14:35:32 UTC 2024 x86_64 GNU/Linux

Output of java -version

openjdk version "17.0.12" 2024-07-16 OpenJDK Runtime Environment (Red_Hat-17.0.12.0.7-2) (build 17.0.12+7) OpenJDK 64-Bit Server VM (Red_Hat-17.0.12.0.7-2) (build 17.0.12+7, mixed mode, sharing)

Quarkus version or git rev

3.14.2 (also tested with 3.8 which is the one used by our app)

Build tool (ie. output of mvnw --version or gradlew --version)

Apache Maven 3.9.1 (Red Hat 3.9.1-3)

Additional information

No response

@lorenzobenvenuti lorenzobenvenuti added the kind/bug Something isn't working label Sep 11, 2024
@quarkus-bot
Copy link

quarkus-bot bot commented Sep 11, 2024

/cc @sberyozkin (security)

@sberyozkin
Copy link
Member

sberyozkin commented Sep 11, 2024

Thanks @lorenzobenvenuti, I wonder though, is it a bug or the way the pools are managed right now in general and it really should be re-qualified into an enhancement request.
Let me CC @cescoffier @geoand @jponge
Michal, @michalvavrik, I don't recall us tuning it specifically for the security work:

https://github.com/quarkusio/quarkus/blob/main/extensions/vertx-http/runtime/src/main/java/io/quarkus/vertx/http/runtime/security/VertxBlockingSecurityExecutor.java

May be it can be just configured, to control the number the pool threads etc ?

@michalvavrik
Copy link
Member

michalvavrik, I don't recall us tuning it specifically for the security work:

https://github.com/quarkusio/quarkus/blob/main/extensions/vertx-http/runtime/src/main/java/io/quarkus/vertx/http/runtime/security/VertxBlockingSecurityExecutor.java
May be it can be just configured, to control the number the pool threads etc ?

The VertxBlockingSecurityExecutor runs blocking code on duplicated context. Honestly I expected that it schedule execution on the worker thread and isn't limiting the event loop until the blocking phase is over. I checked quickly and I can see WS Next and RESTEasy Classic doing same at few places.

@sberyozkin yeah, let's wait till Clemetn, Julien Pong or Viet finds a time and have a quick look on how we do blocking. Thanks

@cescoffier
Copy link
Member

makes sure your executeBlocking set ordered=false.

@michalvavrik
Copy link
Member

michalvavrik commented Sep 12, 2024

makes sure your executeBlocking set ordered=false.

Yes, alright, our executeBlocking doesn't set it and implicit is true. I am finishing other PR, but then I'll run reproducer with/without this change and test it at least locally. Also I need to check docs bit to understand if I am going to change it [understand the matter]. Thanks @cescoffier

@michalvavrik
Copy link
Member

michalvavrik commented Sep 12, 2024

makes sure your executeBlocking set ordered=false.

https://vertx.io/docs/vertx-core/java/#blocking_code says By default, if executeBlocking is called several times from the same context then the different executeBlocking are executed serially.. But:

  • we run blocking tasks on duplicated context
  • each HTTP request is dispatched on new duplicated context
  • we have ordered executing of blocking tasks
  • if augmentor 1 is running blocking task, we want augmentor 2 to consume that task (which is happening), so there in fact some order is natural

What doesn't add up is that:

  • this issue description says However, if an unauthenticated request (i.e. blocking task is not run) is performed while some blocking tasks are running, it's served immediately which shows that the event loop threads are not busy.
  • there is only one augmentor -----> there is only one blocking call per HTTP request

Do I understand it right @cescoffier that ordered=false is not per one duplicated context but per all HTTP requests coming to Quarkus? I would say no, in which case I don't know how could this be cause of the issue.

Any hint would be appreciated when the time is right for you.

@cescoffier
Copy link
Member

No ordered is not about duplicated but root context (event loops).

@michalvavrik
Copy link
Member

No ordered is not about duplicated but root context (event loops).

Thanks. In that case, we don't want order. I'll re-check reproducer later.

@michalvavrik
Copy link
Member

Thank you very much @lorenzobenvenuti for reporting this issue with very nice description. I didn't use your reproducer but it's always great to have it.

@quarkus-bot quarkus-bot bot added this to the 3.16 - main milestone Sep 13, 2024
@lorenzobenvenuti
Copy link
Author

Thanks all for the quick fix! Will the fix be backported to 3.8 or do we need to wait until 3.15 (we want to stick to LTS versions).

Thanks,

lorenzo

@michalvavrik
Copy link
Member

Thanks all for the quick fix! Will the fix be backported to 3.8 or do we need to wait until 3.15 (we want to stick to LTS versions).

Thanks,

lorenzo

Added backport label for 3.8, it's proposal. Backports to 3.8 needs to be conservative, so please watch backports linked to this issue/commit that fixed it.

@gsmet gsmet modified the milestones: 3.16 - main, 3.15.0, 3.14.4 Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/security kind/bug Something isn't working
Projects
None yet
5 participants