Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-context Question Answering on LongBench N=162

31.5F1 Score

Ref. ceiling

-0.8447.55315.9524.347May 18, 2026
Updated 14d ago

Evaluation Results

MethodLinks
2026.05
31.5100
2026.05
3095.4
2026.05
29.995.1
2026.05
29.8-
2026.05
29.8-
2026.05
29.8-
2026.05
29.8-
2026.05
2991.9
2026.05
2991.9
2026.05
2991.9
2026.05
2991.9
2026.05
28.9-
2026.05
28.5-
2026.05
28.289.4
2026.05
2579.5
2026.05
23-
2026.05
23-
2026.05
23-
2026.05
23-
2026.05
22.9-
2026.05
22.5-
2026.05
19.8-
2026.05
18.458.5
2026.05
12.4-
2026.05
3.812.2
2026.05
3.8-
2026.05
3.812
2026.05
3.7-
2026.05
3.5-
2026.05
3-
2026.05
3-
2026.05
3-
2026.05
2.78.4
2026.05
2.78.5
2026.05
2.1-
2026.05
1.9-
2026.05
1.13.6
2026.05
1.1-
2026.05
1-
2026.05
0.9-
2026.05
0.82.4
2026.05
0.51.6
2026.05
0.5-
2026.05
0.4-
2026.05
0.4-