Share your thoughts, 1 month free Claude Pro on usSee more

Long-context language modeling on InfiniteBench (test)

34.82En QA Score

Llama 3.1 8B Instruct

Updated 3mo ago

Evaluation Results

Method	Links
Llama 3.1 8B Instruct 2025.11		34.82	-	-	-	-	-	-	100	99.32	87	-
SAGE-KV 2025.11		34.69	-	-	-	-	-	-	100	96.1	1	-
SnapStream 2025.11		32.44	-	-	-	-	-	-	100	92.2	59	-
StreamingLLM 2025.11		27.57	-	-	-	-	-	-	6.78	6.44	2.4	-
SnapKV 2026.01		1.01	1	0.99	0.99	0.99	1.01	1.01	1.02	1.01	1.01	1
SnapKV 2026.01		1	1	1	1	1	1	1	1	1	1	1
ASL_2pass 2026.01		0.83	0.8	0.9	0.85	0.84	0.83	0.72	0.76	0.7	0.73	0.82
ASL 2026.01		0.82	0.8	0.9	0.85	0.85	0.82	0.73	0.77	0.7	0.73	0.82
ASL_2pass 2026.01		0.74	0.58	0.77	0.71	0.71	0.69	0.67	0.48	0.62	0.68	0.69
ASL 2026.01		0.73	0.58	0.76	0.71	0.71	0.67	0.67	0.47	0.6	0.68	0.68
FastKV 2026.01		0.53	0.54	0.54	0.53	0.54	0.54	0.54	0.54	0.53	0.53	0.53
GemFilter 2026.01		0.53	0.54	0.53	0.53	0.53	0.53	0.53	0.54	0.52	0.53	0.53
FastKV 2026.01		0.5	0.5	0.51	0.5	0.51	0.5	0.5	0.52	0.5	0.5	0.5
GemFilter 2026.01		0.44	0.44	0.44	0.43	0.44	0.45	0.44	0.45	0.44	0.44	0.44