-
-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Speculative decoding] Support target-model logprobs #4378
Conversation
btw, warning; there willl be a big sampler refactoring in this PR; #4309 |
thanks for heads up; I think I can keep it decoupled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, have you ran profiling to see the perf impact?
@cadedaniel can we get this merged today? |
@richardliaw yep @Yard1 I benchmarked and there is room to optimize. I feel we should follow up once we have E2E spec decode numbers (the implementation is reasonably efficient) |
This PR allows vLLM to return correct log-probabilities of sampled tokens when speculative decoding is enabled. In addition, if the user specifies
logprobs
in their request, the correct top-k logprobs are returned.The log-probabilities are expected to be equal to the log-probabilities when speculative decoding is not used.
Testing