Disable pre-fetching when using queue policy #237

tanmayv25 · 2023-08-02T03:15:12Z

Disabling pre-fetching so queue policy can be locally applied to the model queue.

Test PR: triton-inference-server/server#6133

Fixes triton-inference-server/server#5783

src/rate_limiter.cc

nv-kmcgill53 · 2023-08-02T15:53:02Z

src/rate_limiter.cc

  PayloadQueue* payload_queue = nullptr;
  {
    std::lock_guard<std::mutex> lk(payload_queues_mu_);
+    if (payload_queues_.find(model) == payload_queues_.end()) {
+      LOG_ERROR << "Should not print this! Waiting for the consumer for an "


Error message should explain more.

("Waiting for a consumer which has no payload queue for model %s", model)

nv-kmcgill53 · 2023-08-02T15:58:09Z

src/rate_limiter.cc

+    std::lock_guard<std::mutex> lk(payload_queues_mu_);
+    if (payload_queues_.find(model) == payload_queues_.end()) {
+      LOG_ERROR << "Should not print this! Waiting for the consumer for an "
+                   "unknown model";


Sane as above: ("Waiting for a consumer which has no payload queue for model %s", model)

nv-kmcgill53 · 2023-08-02T16:01:31Z

src/rate_limiter.cc

+      return 0;
+    }
+    payload_queue = payload_queues_[model].get();
+  }


nit: This is a completely subjective opinion, but you could make getting the payload queue it's own function. I think it reads fine either way and will not block this PR on this.

nv-kmcgill53 · 2023-08-02T16:19:31Z

src/rate_limiter.cc

+    }
+    {
+      std::lock_guard<std::mutex> lk(payload_queue->mu_);
+      auto multiplier = (model_instance == nullptr)


This auto is probably ok since it should get typed to size_t rather than an int. My style is to be explicit in these situations to tell the reader of the code we do not want integer overflow in the proceeding lines.

I will preface this next comment with, "I don't know the implicit casting rules when using the < operator." This has the very small potential to be a pain in line 224 if multiplier is typed to int and (2 * multiplier) gets overflowed. Again, this shouldn't be a problem since we don't expect multiplier to be greater than 2^32-1.

nv-kmcgill53 · 2023-08-02T16:21:54Z

src/rate_limiter.cc

@@ -190,6 +249,14 @@ RateLimiter::EnqueuePayload(
    }


Should probably update this error message as well.

nv-kmcgill53 · 2023-08-02T16:27:46Z

src/rate_limiter.cc

  {
    std::lock_guard<std::mutex> lk(payload_queues_mu_);
-    if (payload_queues_.find(instances[0]->Model()) == payload_queues_.end()) {
+    if (payload_queues_.find(model) == payload_queues_.end()) {
      LOG_ERROR << "Should not print this! Dequeuing payload with an unknown "
                   "instance.";


"Dequeuing a payload which has no queue for model " << model << "."

src/rate_limiter.cc

GuanLuo · 2023-08-03T19:05:50Z

src/rate_limiter.h

-  bool PayloadSlotAvailable(const TritonModel* model);
+  bool PayloadSlotAvailable(
+      const TritonModel* model, const TritonModelInstance* model_instance,
+      const bool support_prefetching, const bool force_non_blocking = false);


document the variables

GuanLuo · 2023-08-03T19:24:29Z

src/rate_limiter.cc

+    const bool support_prefetching, const bool force_non_blocking)
+{
+  bool result;
+  if (support_prefetching) {


is force_non_blocking only use when !support_prefectching? Why this variable doesn't matter if support_prefetching?

When pre-fetching, the call is always non-blocking.

I guess my confusion is that why it needs to block for some cases if disabling pre-fetching? My impression is that pre-fetching or not only changes the "timing" of no available payload, why it will change the scheduler behavior: from always non-blocking to sometime blocking.

This is a good question. There exists an assymetry in batcher thread based upon pre-fetch settings.

The function PayloadSlotAvailable is called in two places. First in the Enqueue function to determine whether or not to wake up the batcher thread. There is no need to wake up the batcher thread if there isn't a slot available. Additionally, the call here should not be blocking as the thread is just enqueueing requests on policy queue. This should be symmetric for both pre-fetch enable as well as disable.

In batcher thread, assymetry is introduced by enabling pre-fetching. The batcher thread hold some requests in curr_payload_ object without enqueuing the payload to RateLimiter. Hence, even if the payload slot is not available on ratelimiter queue, we can keep building requests on curr_payload and push the payload as soon as a slot in the RateLimiter appears.
When pre-fetching is not enabled, the batcher thread shouldn't pull any requests from the policy queue until there is an idle runner calling DequeuePayload on the RateLimiter. Hence, it is better to block the batcher from making any progress and block it.

I didn't want to modify the pre-fetching logic as it was implemented after careful performance optimization.

GuanLuo · 2023-08-04T19:10:24Z

src/rate_limiter.h

+  /// to allow function to return back with availability. \param model The
+  /// pointer to TritonModel object to query for . \param model_instance The
+  /// pointer to TritonMode \param support_prefetching Whether or not
+  /// pre-fetching of payloads is enabled. \param force_non_blocking When set


Disable pre-fetching when using queue policy

0760994

tanmayv25 requested review from kthui, dyastremsky and nv-kmcgill53 August 2, 2023 03:15

tanmayv25 mentioned this pull request Aug 2, 2023

Fix queue test to expect exact number of failures triton-inference-server/server#6133

Merged

dyastremsky reviewed Aug 2, 2023

View reviewed changes

src/rate_limiter.cc Outdated Show resolved Hide resolved

src/rate_limiter.cc Show resolved Hide resolved

nv-kmcgill53 reviewed Aug 2, 2023

View reviewed changes

tanmayv25 requested a review from GuanLuo August 2, 2023 19:12

GuanLuo reviewed Aug 3, 2023

View reviewed changes

Address review comments

b29310e

tanmayv25 requested review from GuanLuo, nv-kmcgill53 and dyastremsky August 4, 2023 00:21

dyastremsky previously approved these changes Aug 4, 2023

View reviewed changes

GuanLuo reviewed Aug 4, 2023

View reviewed changes

GuanLuo previously approved these changes Aug 4, 2023

View reviewed changes

Fix the line spacing

4ed596e

tanmayv25 dismissed stale reviews from GuanLuo and dyastremsky via 4ed596e August 4, 2023 21:37

GuanLuo approved these changes Aug 4, 2023

View reviewed changes

tanmayv25 merged commit 66c61d2 into main Aug 4, 2023
1 check passed

tanmayv25 deleted the tanmayv-queue branch August 4, 2023 22:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable pre-fetching when using queue policy #237

Disable pre-fetching when using queue policy #237

tanmayv25 commented Aug 2, 2023 •

edited

Loading

nv-kmcgill53 Aug 2, 2023

nv-kmcgill53 Aug 2, 2023

nv-kmcgill53 Aug 2, 2023

nv-kmcgill53 Aug 2, 2023 •

edited

Loading

nv-kmcgill53 Aug 2, 2023

nv-kmcgill53 Aug 2, 2023

GuanLuo Aug 3, 2023

GuanLuo Aug 3, 2023

tanmayv25 Aug 3, 2023

GuanLuo Aug 3, 2023

tanmayv25 Aug 4, 2023 •

edited

Loading

GuanLuo Aug 4, 2023

Disable pre-fetching when using queue policy #237

Disable pre-fetching when using queue policy #237

Conversation

tanmayv25 commented Aug 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nv-kmcgill53 Aug 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tanmayv25 Aug 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tanmayv25 commented Aug 2, 2023 •

edited

Loading

nv-kmcgill53 Aug 2, 2023 •

edited

Loading

tanmayv25 Aug 4, 2023 •

edited

Loading