Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Research Paper Reasoning and Comprehension on ArxivRollBench 2025a (val)

44.3Valid Accuracy

google/gemini-2.5-flash

-1.77210.18922.1534.111Jul 25, 2025
Updated 8d ago

Evaluation Results

MethodLinks
44.344.3100
43.143.1100
42.942.9100
42.742.7100
41.241.2100
40.540.5100
39.539.5100
2025.07
3737100
2025.07
36.236.2100
2025.07
35.335.3100
3333100
32.232.2100
30.130.1100
29.629.6100
2929100
2025.07
28.428.4100
28.428.4100
2025.07
2828100
27.527.5100
23.423.4100
22.622.6100
2025.07
22.322.3100
21.821.8100
21.721.7100
2025.07
21.421.4100
18.318.3100
16.916.9100
13.413.4100
12.812.8100
2025.07
12.512.5100
2025.07
9.29.2100
8.48.4100
7.37.3100
2025.07
7.27.2100
4.84.8100
2025.07
4.44.4100
2025.07
0.20.2100
00100