-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable pre-fetching when using queue policy #237
Conversation
src/rate_limiter.cc
Outdated
PayloadQueue* payload_queue = nullptr; | ||
{ | ||
std::lock_guard<std::mutex> lk(payload_queues_mu_); | ||
if (payload_queues_.find(model) == payload_queues_.end()) { | ||
LOG_ERROR << "Should not print this! Waiting for the consumer for an " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error message should explain more.
("Waiting for a consumer which has no payload queue for model %s", model)
src/rate_limiter.cc
Outdated
std::lock_guard<std::mutex> lk(payload_queues_mu_); | ||
if (payload_queues_.find(model) == payload_queues_.end()) { | ||
LOG_ERROR << "Should not print this! Waiting for the consumer for an " | ||
"unknown model"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sane as above: ("Waiting for a consumer which has no payload queue for model %s", model)
return 0; | ||
} | ||
payload_queue = payload_queues_[model].get(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This is a completely subjective opinion, but you could make getting the payload queue it's own function. I think it reads fine either way and will not block this PR on this.
src/rate_limiter.cc
Outdated
} | ||
{ | ||
std::lock_guard<std::mutex> lk(payload_queue->mu_); | ||
auto multiplier = (model_instance == nullptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This auto
is probably ok since it should get typed to size_t
rather than an int
. My style is to be explicit in these situations to tell the reader of the code we do not want integer overflow in the proceeding lines.
I will preface this next comment with, "I don't know the implicit casting rules when using the <
operator." This has the very small potential to be a pain in line 224 if multiplier is typed to int
and (2 * multiplier)
gets overflowed. Again, this shouldn't be a problem since we don't expect multiplier
to be greater than 2^32-1.
@@ -190,6 +249,14 @@ RateLimiter::EnqueuePayload( | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably update this error message as well.
src/rate_limiter.cc
Outdated
{ | ||
std::lock_guard<std::mutex> lk(payload_queues_mu_); | ||
if (payload_queues_.find(instances[0]->Model()) == payload_queues_.end()) { | ||
if (payload_queues_.find(model) == payload_queues_.end()) { | ||
LOG_ERROR << "Should not print this! Dequeuing payload with an unknown " | ||
"instance."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Dequeuing a payload which has no queue for model " << model << "."
bool PayloadSlotAvailable(const TritonModel* model); | ||
bool PayloadSlotAvailable( | ||
const TritonModel* model, const TritonModelInstance* model_instance, | ||
const bool support_prefetching, const bool force_non_blocking = false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
document the variables
const bool support_prefetching, const bool force_non_blocking) | ||
{ | ||
bool result; | ||
if (support_prefetching) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is force_non_blocking
only use when !support_prefectching
? Why this variable doesn't matter if support_prefetching
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When pre-fetching, the call is always non-blocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my confusion is that why it needs to block for some cases if disabling pre-fetching? My impression is that pre-fetching or not only changes the "timing" of no available payload, why it will change the scheduler behavior: from always non-blocking to sometime blocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good question. There exists an assymetry in batcher thread based upon pre-fetch settings.
The function PayloadSlotAvailable is called in two places. First in the Enqueue function to determine whether or not to wake up the batcher thread. There is no need to wake up the batcher thread if there isn't a slot available. Additionally, the call here should not be blocking as the thread is just enqueueing requests on policy queue. This should be symmetric for both pre-fetch enable as well as disable.
In batcher thread, assymetry is introduced by enabling pre-fetching. The batcher thread hold some requests in curr_payload_ object without enqueuing the payload to RateLimiter. Hence, even if the payload slot is not available on ratelimiter queue, we can keep building requests on curr_payload and push the payload as soon as a slot in the RateLimiter appears.
When pre-fetching is not enabled, the batcher thread shouldn't pull any requests from the policy queue until there is an idle runner calling DequeuePayload on the RateLimiter. Hence, it is better to block the batcher from making any progress and block it.
I didn't want to modify the pre-fetching logic as it was implemented after careful performance optimization.
src/rate_limiter.h
Outdated
/// to allow function to return back with availability. \param model The | ||
/// pointer to TritonModel object to query for . \param model_instance The | ||
/// pointer to TritonMode \param support_prefetching Whether or not | ||
/// pre-fetching of payloads is enabled. \param force_non_blocking When set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Newlines?
Disabling pre-fetching so queue policy can be locally applied to the model queue.
Test PR: triton-inference-server/server#6133
Fixes triton-inference-server/server#5783