You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
When I configure the model's queue policy by setting the parameters "default_timeout_microseconds" or "max_queue_size", Triton does not handle requests in the accurate way as expected.
Triton Information
Docker image: nvcr.io/nvidia/tritonserver:23.03-py3
So, if I send 4 requests, they will all be processed. But if I send 5 requests, only the last one will be rejected due to the "Exceeds maximum queue size" error.
Quick investigation
System monitoring shows that some 3 threads of the main process are running additionally
Not completely sure, but I believe it might explain the current behavior: in the "test_timeout" test case, when I send 4 requests, the first 3 are taken by those workers and only one is kept in the queue.
And when I set "max_queue_size", similar things are happening: 3 requests are taken by those workers, 1 is kept in the queue, and the 5th request is rejected because the queue is full.
The text was updated successfully, but these errors were encountered:
Description
When I configure the model's queue policy by setting the parameters "default_timeout_microseconds" or "max_queue_size", Triton does not handle requests in the accurate way as expected.
Triton Information
Docker image: nvcr.io/nvidia/tritonserver:23.03-py3
To Reproduce
Model
Model Config
Test Case
Expected behavior
Since
I expect:
but I get:
Logs produced by the test case:
I am experiencing a similar issue with another queue policy option. Let's modify the configuration I provided above by setting "max_queue_size":
So, if I send 4 requests, they will all be processed. But if I send 5 requests, only the last one will be rejected due to the "Exceeds maximum queue size" error.
Quick investigation
System monitoring shows that some 3 threads of the main process are running additionally
Not completely sure, but I believe it might explain the current behavior: in the "test_timeout" test case, when I send 4 requests, the first 3 are taken by those workers and only one is kept in the queue.
And when I set "max_queue_size", similar things are happening: 3 requests are taken by those workers, 1 is kept in the queue, and the 5th request is rejected because the queue is full.
The text was updated successfully, but these errors were encountered: