Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable pre-fetching when using queue policy #237

Merged
merged 3 commits into from
Aug 4, 2023
Merged

Conversation

tanmayv25
Copy link
Contributor

@tanmayv25 tanmayv25 commented Aug 2, 2023

Disabling pre-fetching so queue policy can be locally applied to the model queue.

Test PR: triton-inference-server/server#6133

Fixes triton-inference-server/server#5783

src/rate_limiter.cc Outdated Show resolved Hide resolved
src/rate_limiter.cc Show resolved Hide resolved
PayloadQueue* payload_queue = nullptr;
{
std::lock_guard<std::mutex> lk(payload_queues_mu_);
if (payload_queues_.find(model) == payload_queues_.end()) {
LOG_ERROR << "Should not print this! Waiting for the consumer for an "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error message should explain more.

("Waiting for a consumer which has no payload queue for model %s", model)

std::lock_guard<std::mutex> lk(payload_queues_mu_);
if (payload_queues_.find(model) == payload_queues_.end()) {
LOG_ERROR << "Should not print this! Waiting for the consumer for an "
"unknown model";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sane as above: ("Waiting for a consumer which has no payload queue for model %s", model)

return 0;
}
payload_queue = payload_queues_[model].get();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This is a completely subjective opinion, but you could make getting the payload queue it's own function. I think it reads fine either way and will not block this PR on this.

}
{
std::lock_guard<std::mutex> lk(payload_queue->mu_);
auto multiplier = (model_instance == nullptr)
Copy link
Contributor

@nv-kmcgill53 nv-kmcgill53 Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This auto is probably ok since it should get typed to size_t rather than an int. My style is to be explicit in these situations to tell the reader of the code we do not want integer overflow in the proceeding lines.

I will preface this next comment with, "I don't know the implicit casting rules when using the < operator." This has the very small potential to be a pain in line 224 if multiplier is typed to int and (2 * multiplier) gets overflowed. Again, this shouldn't be a problem since we don't expect multiplier to be greater than 2^32-1.

@@ -190,6 +249,14 @@ RateLimiter::EnqueuePayload(
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably update this error message as well.

{
std::lock_guard<std::mutex> lk(payload_queues_mu_);
if (payload_queues_.find(instances[0]->Model()) == payload_queues_.end()) {
if (payload_queues_.find(model) == payload_queues_.end()) {
LOG_ERROR << "Should not print this! Dequeuing payload with an unknown "
"instance.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Dequeuing a payload which has no queue for model " << model << "."

src/rate_limiter.cc Show resolved Hide resolved
@tanmayv25 tanmayv25 requested a review from GuanLuo August 2, 2023 19:12
bool PayloadSlotAvailable(const TritonModel* model);
bool PayloadSlotAvailable(
const TritonModel* model, const TritonModelInstance* model_instance,
const bool support_prefetching, const bool force_non_blocking = false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document the variables

const bool support_prefetching, const bool force_non_blocking)
{
bool result;
if (support_prefetching) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is force_non_blocking only use when !support_prefectching? Why this variable doesn't matter if support_prefetching?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When pre-fetching, the call is always non-blocking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my confusion is that why it needs to block for some cases if disabling pre-fetching? My impression is that pre-fetching or not only changes the "timing" of no available payload, why it will change the scheduler behavior: from always non-blocking to sometime blocking.

Copy link
Contributor Author

@tanmayv25 tanmayv25 Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question. There exists an assymetry in batcher thread based upon pre-fetch settings.

The function PayloadSlotAvailable is called in two places. First in the Enqueue function to determine whether or not to wake up the batcher thread. There is no need to wake up the batcher thread if there isn't a slot available. Additionally, the call here should not be blocking as the thread is just enqueueing requests on policy queue. This should be symmetric for both pre-fetch enable as well as disable.

In batcher thread, assymetry is introduced by enabling pre-fetching. The batcher thread hold some requests in curr_payload_ object without enqueuing the payload to RateLimiter. Hence, even if the payload slot is not available on ratelimiter queue, we can keep building requests on curr_payload and push the payload as soon as a slot in the RateLimiter appears.
When pre-fetching is not enabled, the batcher thread shouldn't pull any requests from the policy queue until there is an idle runner calling DequeuePayload on the RateLimiter. Hence, it is better to block the batcher from making any progress and block it.

I didn't want to modify the pre-fetching logic as it was implemented after careful performance optimization.

dyastremsky
dyastremsky previously approved these changes Aug 4, 2023
/// to allow function to return back with availability. \param model The
/// pointer to TritonModel object to query for . \param model_instance The
/// pointer to TritonMode \param support_prefetching Whether or not
/// pre-fetching of payloads is enabled. \param force_non_blocking When set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newlines?

GuanLuo
GuanLuo previously approved these changes Aug 4, 2023
@tanmayv25 tanmayv25 dismissed stale reviews from GuanLuo and dyastremsky via 4ed596e August 4, 2023 21:37
@tanmayv25 tanmayv25 merged commit 66c61d2 into main Aug 4, 2023
1 check passed
@tanmayv25 tanmayv25 deleted the tanmayv-queue branch August 4, 2023 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Inaccurate request handling when configuring queue policy
4 participants