Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense Reasoning on ARC-E

96.4Accuracy

Self-consistency

27.34445.27263.281.128Mar 21, 2022Nov 21, 2022Jul 25, 2023Mar 27, 2024Nov 27, 2024Jul 31, 2025Apr 3, 2026
Updated 11d ago

Evaluation Results

MethodLinks
2022.03
96.4--
2022.03
96--
2022.03
95.3--
2022.03
94--
2026.04
91.384.90.36
2026.04
91.383.710.33
2026.04
91.23.720.33
2026.04
91.195.760.35
2026.04
91.033.770.35
2026.04
90.668.950.74
2026.04
90.664.510.61
2026.04
90.45.420.35
2026.04
90.259.020.79
2026.04
90.256.150.39
2026.04
90.255.410.36
2026.04
90.165.660.3
2024.06
86.689.430.56
2024.06
86.5512.070.9
86.4--
2024.06
86.2112.21
2026.01
85.915.30.38
2024.06
85.8612.280.91
2024.06
85.6513.121.17
2024.06
85.563.640.4
2026.01
85.563.640.4
2026.01
85.43.40.41
2026.01
85.28.90.54
2026.01
85.28.90.54
2026.01
85.24.70.43
2026.01
85.18.80.52
2026.02
85--
2026.01
84.84.20.44
2024.06
84.746.160.46
2024.06
84.412.570.82
2026.03
81.4--
2026.03
81.2--
2026.03
80.9--
2024.06
80.0533.290.88
2026.03
80--
2022.03
79.3--
2026.01
7712.280.91
2022.03
75.3--
2025.02
73.39--
2025.02
73.23--
2025.02
72.78--
2025.02
72.63--
2022.03
72.1--
2025.02
71.99--
2025.02
71.91--
2026.03
71.5--
2025.02
71.17--
2026.02
71--
2025.02
70.8--
2025.02
70.61--
2025.02
70.49--
2026.02
70--
2022.03
69.8--
2025.02
69.61--
2025.02
69.58--
2026.03
69.44--
2025.02
69.17--
2026.03
69.11--
2026.03
69.02--
2025.02
68.87--
2026.03
68.77--
2026.03
68.22--
2025.02
68.13--
2026.03
68.01--
2026.02
68--
2026.03
67.97--
2026.03
67.89--
2026.03
67.63--
2026.03
67.13--
2026.03
66.16--
2025.07
63.2--
2022.03
63.1--
2022.03
61.6--
2025.05
57.7--
2025.05
57.15--
2026.03
53.91--
2026.03
52.4--
2026.01
51.14--
2026.03
50.97--
2026.03
50.21--
2026.03
48.74--
2026.01
48.48--
2026.03
47.81--
2025.07
46.2--
2026.03
45.41--
2025.07
44.3--
2025.07
43.4--
2026.01
39.98--
2026.01
36.66--
2026.03
33.16--
2026.03
32.66--
2026.03
32.37--
2026.03
31.27--
2026.01
31.02--
2026.01
30.22--
2025.07
30--
Showing 100 of 106 rows