Online simulation for small acceptance rates

I've [been](https://github.com/keyboardAnt/distributed-speculative-inference/issues/32) [working](https://github.com/keyboardAnt/distributed-speculative-inference/pull/38) [toward](https://github.com/keyboardAnt/distributed-speculative-inference/pull/36) [generalizing](https://github.com/keyboardAnt/distributed-speculative-inference/pull/31) the heatmap (Figure 1 in the paper) so that we can plug in online simulations instead of only offline. Calculating the heatmap requires running simulations for a grid of configurations. Unfortunately, our current online simulation implementation doesn't support small acceptance rates. This issue not only prevents us from creating heatmaps but also reveals a bug 🥲

Thanks to edge case 1 described below, generating `S` tokens shouldn't take much longer than `S * (failure_cost + c + wait_for_pipe)`. However, our current implementation ignores the edge case, causing the latency to scale with `1/acceptance_rate`. For example, generating `S=77` tokens with `acceptance_rate=0.01` requires approximately `7700` iterations instead of `<= 77`. For reference, I added tests covering the issue (see the two skipped tests in `tests/online/test_simul.py`, named `test_correct_token_count_per_iteration` and `test_duration`)

The edge cases (from private email correspondence I sent on July 15):

> Two edge cases of DSI boost its speedup but are overlooked in our online simulations. The two suggested changes below will improve the speedups reported in the paper (affecting Table 1 and Figure 1).
> 
> 1. Accept an extra token if the target rejects a draft or if the extra token is the last.
> 2. Simulate an immediate validation of the extra token by terminating the corresponding speculating iteration with probability `1 - acceptance_rate`.

For more context, please see my PR [comment](https://github.com/keyboardAnt/distributed-speculative-inference/pull/30#discussion_r1677205050).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Online simulation for small acceptance rates #37

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Online simulation for small acceptance rates #37

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions