Skip to content

Commit

Permalink
[Bugfix] Add warmup for prefix caching example (vllm-project#5235)
Browse files Browse the repository at this point in the history
  • Loading branch information
zhuohan123 authored and jimpang committed Jul 8, 2024
1 parent 7e2e3e4 commit 1a44ece
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions examples/offline_inference_with_prefix.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,10 @@

print("-" * 80)

# The llm.generate call will batch all prompts and send the batch at once
# if resources allow.
# Warmup so that the shared prompt's KV cache is computed.
prefix_cached_llm.generate(generating_prompts[0], sampling_params)

# Generate with prefix caching.
start_time_cached = time()
outputs = prefix_cached_llm.generate(generating_prompts, sampling_params)
duration_cached = time() - start_time_cached
Expand Down

0 comments on commit 1a44ece

Please sign in to comment.