Share your thoughts, 1 month free Claude Pro on usSee more

Inference Efficiency on 32k Context Length Llama-3-8B (test)

4.12Time To First Token (s)

Full Model

Updated 1mo ago

Evaluation Results

Method	Links
Full Model 2026.03		4.12	0.081	24.27
StreamingLLM 2026.03		4.12	0.032	16.12
H2O 2026.03		4.12	0.033	16.58
SnapKV 2026.03		4.13	0.032	16.37
SemantiCache 2026.03		4.25	0.031	15.94
CaM 2026.03		4.28	0.039	17.03
D2O 2026.03		4.29	0.038	16.91