Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
LLM Inference Performance on Context Length 120K
Loading...
5.66
Prefill Time (s)
IndexCache
5.5436
6.3293
7.115
7.9007
Mar 12, 2026
Prefill Time (s)
Decode Throughput (tok/s, per request)
Decode Throughput (tok/s, full KV cache)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Prefill Time (s)
Decode Throughput (tok/s, per request)
Decode Throughput (tok/s, full KV cache)
IndexCache
Retention Ratio=1/4
2026.03
5.66
88
498
IndexCache
Retention Ratio=1/2
2026.03
6.57
77
431
DSA
Retention Ratio=Full
2026.03
8.57
63
341
Feedback
Search any
task
Search any
task