Share your thoughts, 1 month free Claude Pro on usSee more

Inference Efficiency on LLaMA 8B 32K context length 3.1

1,115Theoretical Compute (TFLOPs)

SpecKV

Updated 4mo ago

Evaluation Results

Method	Links
SpecKV 2026.03		1,115	106	2,156	402.8	2,263	503
LAQ 2026.03		930	451	1,993	239.26	2,314	554
LOOKAHEADKV 2026.03		929	13	1,755	1.74	1,798	38
Forward Pass Only 2026.03		928	13	1,754	-	1,760	-
SnapKV 2026.03		928	13	1,754	0.01	1,838	78