Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
LLM Inference Performance on Context Length 10K
Loading...
0.45
Prefill Time (s)
IndexCache
0.4452
0.4776
0.51
0.5424
Mar 12, 2026
Prefill Time (s)
Decode Throughput (tok/s, per request)
Decode Throughput (tok/s, full KV cache)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Prefill Time (s)
Decode Throughput (tok/s, per request)
Decode Throughput (tok/s, full KV cache)
IndexCache
Retention Ratio=1/4
2026.03
0.45
91
3,310
IndexCache
Retention Ratio=1/2
2026.03
0.47
84.5
3,070
DSA
Retention Ratio=Full
2026.03
0.57
73.5
2,700
Feedback
Search any
task
Search any
task