Skip to content

Online simulation for small acceptance rates #37

Open
@keyboardAnt

Description

@keyboardAnt

I've been working toward generalizing the heatmap (Figure 1 in the paper) so that we can plug in online simulations instead of only offline. Calculating the heatmap requires running simulations for a grid of configurations. Unfortunately, our current online simulation implementation doesn't support small acceptance rates. This issue not only prevents us from creating heatmaps but also reveals a bug 🥲

Thanks to edge case 1 described below, generating S tokens shouldn't take much longer than S * (failure_cost + c + wait_for_pipe). However, our current implementation ignores the edge case, causing the latency to scale with 1/acceptance_rate. For example, generating S=77 tokens with acceptance_rate=0.01 requires approximately 7700 iterations instead of <= 77. For reference, I added tests covering the issue (see the two skipped tests in tests/online/test_simul.py, named test_correct_token_count_per_iteration and test_duration)

The edge cases (from private email correspondence I sent on July 15):

Two edge cases of DSI boost its speedup but are overlooked in our online simulations. The two suggested changes below will improve the speedups reported in the paper (affecting Table 1 and Figure 1).

  1. Accept an extra token if the target rejects a draft or if the extra token is the last.
  2. Simulate an immediate validation of the extra token by terminating the corresponding speculating iteration with probability 1 - acceptance_rate.

For more context, please see my PR comment.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions