You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The latency & throughput increase is significant though the comparisons are against vLLM. It seems like TRT does batching a bit differently so unsure if this can equally apply here.
The text was updated successfully, but these errors were encountered:
Yes, splitfuse is an impressive advancement from Deepspeed, one we were also working as well! Our implementation will be a little different due to difference in batching strategies, but the idea of chunking the prefill is the same! Likely will land in the next few releases.
Hi, putting this here:
https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen
The latency & throughput increase is significant though the comparisons are against vLLM. It seems like TRT does batching a bit differently so unsure if this can equally apply here.
The text was updated successfully, but these errors were encountered: