[Speculative decoding] Support target-model logprobs #4378

cadedaniel · 2024-04-25T23:33:43Z

This PR allows vLLM to return correct log-probabilities of sampled tokens when speculative decoding is enabled. In addition, if the user specifies logprobs in their request, the correct top-k logprobs are returned.

The log-probabilities are expected to be equal to the log-probabilities when speculative decoding is not used.

Testing

See https://github.com/vllm-project/vllm/pull/4378/files#diff-2d36a32d508a5729c33b7ef42e285d9f382ca997c10b437b005b15390b0450cb

rkooo567 · 2024-04-26T09:58:30Z

btw, warning; there willl be a big sampler refactoring in this PR; #4309

cadedaniel · 2024-04-26T13:24:26Z

thanks for heads up; I think I can keep it decoupled

Yard1

This looks good to me, have you ran profiling to see the perf impact?

richardliaw · 2024-05-03T17:48:14Z

@cadedaniel can we get this merged today?

cadedaniel · 2024-05-03T18:48:14Z

@richardliaw yep

@Yard1 I benchmarked and there is room to optimize. I feel we should follow up once we have E2E spec decode numbers (the implementation is reasonably efficient)

cadedaniel and others added 16 commits April 25, 2024 13:52

spec logprob test

f405d1f

wip

86eba76

wip

a94d7e3

wip

f4cd178

wip

0553114

wip

e6fca84

wip

40dc618

wip

eca218a

wip

bc87739

wip

20b16c0

wip

32e19c9

wip

78515fe

wip

8f9a6f2

wip

a0c5185

wip

2fb5d17

wip

5d95a29

cadedaniel added 12 commits April 29, 2024 08:09

wip

bc6b322

fix

d6676ef

logprob test passes

f6a1086

test logprobs when skip speculation

23a4373

move test to own file

dbca6d7

spec decode tests pass

ce22637

rank

f402af3

mock extra logprob

084162e

wip

1cb32d5

support sampled logprob

9531f84

cleanup

8677cbe

clean

e1dc3d2

cadedaniel added 7 commits May 2, 2024 10:30

wip

51c167c

simplify

d57b7ee

clean

ec74e27

doc

1b842e0

lint

3ff4291

small

26198da

add test with different num logprobs

50cdbe7

cadedaniel changed the title ~~[WIP] [Speculative decoding] Support target-model logprobs~~ [Speculative decoding] Support target-model logprobs May 2, 2024

cadedaniel marked this pull request as ready for review May 2, 2024 19:01

Yard1 reviewed May 2, 2024

View reviewed changes

cadedaniel added 5 commits May 3, 2024 11:57

Merge remote-tracking branch 'upstream/main' into spec-logprobs

4447ab8

unit test fixes

f860b44

lint

79e380c

split async test, log warning once about prompt logprobs

da9b9fb

fix

52d03e7

Yard1 approved these changes May 3, 2024

View reviewed changes

unit test fix

1a9594e

cadedaniel merged commit ab50275 into vllm-project:main May 3, 2024
59 checks passed

robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request May 6, 2024

[Speculative decoding] Support target-model logprobs (vllm-project#4378)

6dd96ce

z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request May 7, 2024

[Speculative decoding] Support target-model logprobs (vllm-project#4378)

81a9e09

dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request May 7, 2024

[Speculative decoding] Support target-model logprobs (vllm-project#4378)

ef94859

dtrifiro mentioned this pull request May 15, 2024

bump ubi base image tag opendatahub-io/vllm#24

Merged

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024

[Speculative decoding] Support target-model logprobs (vllm-project#4378)

dfbb6f0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Speculative decoding] Support target-model logprobs #4378

[Speculative decoding] Support target-model logprobs #4378

cadedaniel commented Apr 25, 2024 •

edited

Loading

rkooo567 commented Apr 26, 2024

cadedaniel commented Apr 26, 2024

Yard1 left a comment •

edited

Loading

richardliaw commented May 3, 2024

cadedaniel commented May 3, 2024

[Speculative decoding] Support target-model logprobs #4378

[Speculative decoding] Support target-model logprobs #4378

Conversation

cadedaniel commented Apr 25, 2024 • edited Loading

Testing

rkooo567 commented Apr 26, 2024

cadedaniel commented Apr 26, 2024

Yard1 left a comment • edited Loading

Choose a reason for hiding this comment

richardliaw commented May 3, 2024

cadedaniel commented May 3, 2024

cadedaniel commented Apr 25, 2024 •

edited

Loading

Yard1 left a comment •

edited

Loading