Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scientific Reasoning on ARC Challenge

92.5Accuracy

Qwen-3-32B

-3.2070421.6399846.48771.33402Oct 7, 2025Nov 6, 2025Dec 7, 2025Jan 7, 2026Feb 7, 2026Mar 10, 2026Apr 10, 2026
Updated 5d ago

Evaluation Results

MethodLinks
2026.02
92.5--
2026.04
92.5--
2026.02
92.3--
2026.02
91.5--
2026.02
91.4--
2026.02
90.3--
2026.02
90.1--
2026.02
90.1--
2026.02
90--
2026.02
89.6--
2026.02
89.2--
2026.02
88.4--
2026.02
87.2--
2026.02
84.6--
2026.02
82.7--
2026.02
79.6--
2026.04
79.3--
2026.02
79--
2026.02
78.7--
2026.02
78.6--
2026.04
77.86--
2026.04
77.4--
2026.04
76.7--
2026.04
76.11--
2026.04
73.8--
2026.02
72.2--
2026.02
69.9--
2025.10
68.52--
2026.02
68.1--
66.9--
2025.10
66.64--
2025.10
66.3--
2025.10
64.51--
2025.10
63.65--
2025.10
63.4--
2025.10
62.37--
2025.10
62.29--
2025.10
60.84--
2025.10
60.15--
2025.10
59.55--
2025.10
57.08--
2025.10
56.31--
2025.10
55.72--
2025.10
55.29--
2026.02
54.6--
2025.10
53.75--
2025.10
53.58--
2025.10
50.51--
2025.10
50.34--
2026.04
49.4--
2025.10
48.89--
2025.10
48.63--
2025.10
48.21--
2025.10
46.16--
2026.02
44.8--
2025.10
44.54--
2026.02
44.4--
2026.02
44.36--
2026.02
44.28--
2025.10
43.77--
2025.10
43.43--
2026.02
43.09--
2026.02
41.98--
2026.02
41.97--
2025.10
41.55--
2026.02
40.35--
2026.02
40.2--
2025.10
40.02--
2025.10
39.16--
2026.02
38.99--
2026.02
38.82--
2026.02
38.73--
2025.10
37.8--
2026.02
35.24--
2026.02
34.89--
2026.02
34.47--
2026.02
34.13--
2026.02
33.78--
2026.02
32.85--
2025.12
0.8260.705-
2025.12
0.8220.652-
2025.12
0.820.687-
2025.12
0.8170.578-
2025.12
0.8140.691-
2025.12
0.8110.682-
2025.12
0.7760.603-
2025.12
0.7420.581-
2025.12
0.7370.579-
2025.12
0.7150.572-
2025.12
0.6820.556-
2025.12
0.6780.519-
2025.12
0.6370.635-
2025.12
0.6020.433-
2025.12
0.4740.461-
2025.10
--18.2
2025.10
--70.3
2025.10
--73.3
2025.10
--70.1
2025.10
--70.3
2025.10
--73.8
Showing 100 of 108 rows