Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning on ARC Challenge (exact_match, flexible_extract)

95.1Exact Match Accuracy

GPT-5 nano

23.44442.04760.6579.253May 8, 2026
Updated 23d ago

Evaluation Results

MethodLinks
2026.05
95.1-
2026.05
90.5-
2026.05
90-
2026.05
89.6-
2026.05
88.1-
2026.05
88-
2026.05
81.7-
2026.05
80.9-
2026.05
75.7-
2026.05
75.6-
2026.05
71-
2026.05
65.9-
2026.05
45.7-
2026.05
45.5-
2026.05
39.8-
2026.05
26.2-
2026.05
-87.2
2026.05
-56.2
2026.05
-25.4
2026.05
-74.1
2026.05
-75.4
2026.05
-93
2026.05
-95.1
2026.05
-73.7
2026.05
-80.5
2026.05
-77.8
2026.05
-92.3
2026.05
-94.2
2026.05
-94.4
2026.05
-54.9
2026.05
-95.4
2026.05
-78.9