Add Splitwise: prompt and token phase separation #2472

goiri · 2024-01-18T00:19:05Z

We have built the system described in http://aka.ms/splitwise
Splitwise splits the prompt and token phases to run in different servers.
This leverages the differences between these two phases to improve throughput.
We have an internal prototype on top of an internal vLLM branch.
This issue tracks the effort to open source this prototype and make it part of the official vLLM.

This includes:

Add MSCCL++ support https://github.com/microsoft/mscclpp
Add per-layer KV-cache transfer
Coordination across prompt and token servers
Documentation

goiri · 2024-01-18T00:23:14Z

This was asked in #2370.

irasin · 2024-01-18T03:23:59Z

LGTM, I was wondering when can we use it in vllm?

goiri · 2024-01-18T17:41:15Z

@irasin, @aashaka is doing some cleanup and refactoring and will be posting the PRs in the next few weeks.
We will be updating this issue (and linking the PRs) with the progress.

adney11 · 2024-02-07T21:08:23Z

Hi All,

Just wanted to check in and see if there is any update on Splitwise's implementation in vLLM, and if this internal prototype codebase can be released?

Thank you!

aashaka · 2024-02-08T02:09:07Z

This has now been released in PR #2809. @adney11, @irasin

aashaka mentioned this issue Jan 18, 2024

How to use Splitwise(from microsoft) in vllm? #2370

Open

zhuohan123 mentioned this issue Jan 31, 2024

[Roadmap] vLLM Roadmap Q1 2024 #2681

Closed

30 tasks

This was referenced Feb 8, 2024

Add Splitwise implementation to vLLM aashaka/vllm-oss#4

Closed

Add Splitwise implementation to vLLM #2809

Open

hmellor closed this as not planned Won't fix, can't repro, duplicate, stale Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Splitwise: prompt and token phase separation #2472

Add Splitwise: prompt and token phase separation #2472

goiri commented Jan 18, 2024

goiri commented Jan 18, 2024

irasin commented Jan 18, 2024

goiri commented Jan 18, 2024

adney11 commented Feb 7, 2024

aashaka commented Feb 8, 2024

Add Splitwise: prompt and token phase separation #2472

Add Splitwise: prompt and token phase separation #2472

Comments

goiri commented Jan 18, 2024

goiri commented Jan 18, 2024

irasin commented Jan 18, 2024

goiri commented Jan 18, 2024

adney11 commented Feb 7, 2024

aashaka commented Feb 8, 2024