Description
I've been working toward generalizing the heatmap (Figure 1 in the paper) so that we can plug in online simulations instead of only offline. Calculating the heatmap requires running simulations for a grid of configurations. Unfortunately, our current online simulation implementation doesn't support small acceptance rates. This issue not only prevents us from creating heatmaps but also reveals a bug 🥲
Thanks to edge case 1 described below, generating S
tokens shouldn't take much longer than S * (failure_cost + c + wait_for_pipe)
. However, our current implementation ignores the edge case, causing the latency to scale with 1/acceptance_rate
. For example, generating S=77
tokens with acceptance_rate=0.01
requires approximately 7700
iterations instead of <= 77
. For reference, I added tests covering the issue (see the two skipped tests in tests/online/test_simul.py
, named test_correct_token_count_per_iteration
and test_duration
)
The edge cases (from private email correspondence I sent on July 15):
Two edge cases of DSI boost its speedup but are overlooked in our online simulations. The two suggested changes below will improve the speedups reported in the paper (affecting Table 1 and Figure 1).
- Accept an extra token if the target rejects a draft or if the extra token is the last.
- Simulate an immediate validation of the extra token by terminating the corresponding speculating iteration with probability
1 - acceptance_rate
.
For more context, please see my PR comment.