Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BackendAttribute for parallel model instance loading #235

Merged
merged 9 commits into from
Jul 26, 2023

Conversation

rmccorm4
Copy link
Contributor

@rmccorm4 rmccorm4 commented Jul 21, 2023

Current default value of the attribute is false, so no parallel loading will be done unless a backend implements and calls the API to enable it. This is just the scaffolding for such capabilities.


Identity Backend opt-in for sanity check in existing pipelines: triton-inference-server/identity_backend#26

Seeing ballpark 2-3x speedup on my machine with Identity Backend loading 100 instances. I think the speedup will be more meaningful for model instances that take longer to initialize in more complicated backends. Will do more thorough performance analysis after enabling support in other backends.

@rmccorm4 rmccorm4 requested a review from GuanLuo July 21, 2023 21:59
Copy link
Member

@Tabrizian Tabrizian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API LGTM.

include/triton/core/tritonbackend.h Outdated Show resolved Hide resolved
src/backend_model.cc Show resolved Hide resolved
@rmccorm4 rmccorm4 marked this pull request as ready for review July 25, 2023 00:06
Copy link
Member

@Tabrizian Tabrizian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@rmccorm4 rmccorm4 merged commit 9714cd6 into main Jul 26, 2023
1 check passed
@rmccorm4 rmccorm4 deleted the rmccormick-optin branch July 26, 2023 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants