Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
LLM Inference Performance on Context Length 60K
Loading...
2.59
Prefill Time (s)
IndexCache
2.5584
2.7717
2.985
3.1983
Mar 12, 2026
Prefill Time (s)
Decode Throughput (tok/s, per request)
Decode Throughput (tok/s, full KV cache)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Prefill Time (s)
Decode Throughput (tok/s, per request)
Decode Throughput (tok/s, full KV cache)
IndexCache
Retention Ratio=1/4
2026.03
2.59
89.5
840
IndexCache
Retention Ratio=1/2
2026.03
2.86
80
750
DSA
Retention Ratio=Full
2026.03
3.38
67
613
Feedback
Search any
task
Search any
task