Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
LLM Inference Performance on Context Length 200K
Loading...
10.7
Prefill Time (s)
IndexCache
10.348
12.724
15.1
17.476
Mar 12, 2026
Prefill Time (s)
Decode Throughput (tok/s, per request)
Decode Throughput (tok/s, full KV cache)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Prefill Time (s)
Decode Throughput (tok/s, per request)
Decode Throughput (tok/s, full KV cache)
IndexCache
Retention Ratio=1/4
2026.03
10.7
86
297
IndexCache
Retention Ratio=1/2
2026.03
13.7
73
253
DSA
Retention Ratio=Full
2026.03
19.5
58
197
Feedback
Search any
task
Search any
task