Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Inference Efficiency on LLaMA 8B 32K context length 3.1
Loading...
1,115
Theoretical Compute (TFLOPs)
SpecKV
920.52
971.01
1,021.5
1,071.99
Mar 11, 2026
Theoretical Compute (TFLOPs)
Theoretical Memory Traffic (GB)
Theoretical TTFT (ms)
Theoretical TTFT Overhead (ms)
Empirical TTFT (ms)
Empirical TTFT Overhead (ms)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Theoretical Compute (TFLOPs)
Theoretical Memory Traffic (GB)
Theoretical TTFT (ms)
Theoretical TTFT Overhead (ms)
Empirical TTFT (ms)
Empirical TTFT Overhead (ms)
SpecKV
Cache Budget (C)=128
2026.03
1,115
106
2,156
402.8
2,263
503
LAQ
Cache Budget (C)=128
2026.03
930
451
1,993
239.26
2,314
554
LOOKAHEADKV
Cache Budget (C)=128
2026.03
929
13
1,755
1.74
1,798
38
Forward Pass Only
Cache Budget (C)=128
2026.03
928
13
1,754
-
1,760
-
SnapKV
Cache Budget (C)=128
2026.03
928
13
1,754
0.01
1,838
78
Feedback
Search any
task
Search any
task