Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-context language understanding on RULER 16k context length

94.73Average Score

RetroInfer

17.634837.649957.66577.6801Nov 18, 2025Dec 1, 2025Dec 14, 2025Dec 27, 2025Jan 9, 2026Jan 22, 2026Feb 4, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2025.11
94.73-----------
2025.11
94.36-----------
2025.11
94.35-----------
2025.11
94.06-----------
2025.11
92.78-----------
2025.11
86.56-----------
2025.11
86.36-----------
2025.11
86.07-----------
2025.11
85.65-----------
2025.11
79.7-----------
2026.02
64.61009395.78119.8074.253.4---
2026.02
61.410081.896.368.719.607153.4---
2026.01
50.6-----------
2026.01
40.2-----------
2026.01
23.4-----------
2026.01
20.6-----------
2026.02
-10010098.59797.493.1780.551.5---
2026.02
-100100989797.593.1779.551.5---
2026.02
-10010097.598.597.693.1773.548---
2026.02
-100100949896.293.837648---
2026.02
-10010098998991.337345.5---
2026.02
-10010098.598.2588.888.678854---
2024.09
-----24--36.785.852
2024.09
-----1.45.33--38.883.929
2024.09
-----027.33--32.75215.5
2024.09
-----02.67--0.4120.920
2024.09
-----1.610.33--2.067.328.5
2024.09
-----5.42.33--66.440
2024.09
-----10.23--57.780.71