Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Splitwise: prompt and token phase separation #2472

Closed
goiri opened this issue Jan 18, 2024 · 5 comments
Closed

Add Splitwise: prompt and token phase separation #2472

goiri opened this issue Jan 18, 2024 · 5 comments

Comments

@goiri
Copy link

goiri commented Jan 18, 2024

We have built the system described in http://aka.ms/splitwise
Splitwise splits the prompt and token phases to run in different servers.
This leverages the differences between these two phases to improve throughput.
We have an internal prototype on top of an internal vLLM branch.
This issue tracks the effort to open source this prototype and make it part of the official vLLM.

This includes:

@goiri
Copy link
Author

goiri commented Jan 18, 2024

This was asked in #2370.

@irasin
Copy link
Contributor

irasin commented Jan 18, 2024

LGTM, I was wondering when can we use it in vllm?

@goiri
Copy link
Author

goiri commented Jan 18, 2024

@irasin, @aashaka is doing some cleanup and refactoring and will be posting the PRs in the next few weeks.
We will be updating this issue (and linking the PRs) with the progress.

@adney11
Copy link

adney11 commented Feb 7, 2024

Hi All,

Just wanted to check in and see if there is any update on Splitwise's implementation in vLLM, and if this internal prototype codebase can be released?

Thank you!

@aashaka
Copy link

aashaka commented Feb 8, 2024

This has now been released in PR #2809. @adney11, @irasin

@hmellor hmellor closed this as not planned Won't fix, can't repro, duplicate, stale Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants