Share your thoughts, 1 month free Claude Pro on usSee more

Long-context language modeling on RULER 1.0 (test)

0.977Accuracy (4K Context)

MInference

Updated 5mo ago

Evaluation Results

Method	Links
MInference 2024.07		0.977	0.912	0.885	0.85	0.823	0.776	0.87
LLAMA-3-8B-262K 2024.07		0.972	0.918	0.873	0.808	0.774	0.722	0.844
StreamingLLM 2024.07		0.972	0.381	0.375	0.172	0.142	0.094	0.35
InfLLM 2024.07		0.947	0.895	0.764	0.665	0.568	0.535	0.729
MInference 2024.07		0.946	0.931	0.91	0.896	0.855	0.84	0.896
GLM-4-9B-1M 2024.07		0.938	0.916	0.893	0.874	0.852	0.808	0.88
StreamingLLM 2024.07		0.938	0.669	0.585	0.514	0.459	0.391	0.593
MInference 2024.07		0.923	0.897	0.79	0.738	0.647	0.569	0.747
Yi-9B-200K 2024.07		0.919	0.902	0.788	0.763	0.681	0.629	0.781
StreamingLLM 2024.07		0.919	0.378	0.339	0.186	0.13	0.128	0.343
InfLLM 2024.07		0.894	0.798	0.701	0.556	0.43	0.395	0.629
InfLLM 2024.07		0.803	0.839	0.607	0.452	0.386	0.302	0.565
StreamingLLM w/ dilated 2024.07		0.448	0.428	0.385	0.298	0.268	0.239	0.344
StreamingLLM w/ dilated 2024.07		0.234	0.007	0.014	0.188	0.165	0.156	0.127
StreamingLLM w/ strided 2024.07		0.026	0.007	0.006	0.006	0.012	0.005	0.011
StreamingLLM w/ strided 2024.07		0.02	0.007	0.006	0.006	0.007	0.013	0.01