Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Science Question Answering on ARC-C

96.3Accuracy

GPT-4

51.611263.213174.81586.4169Oct 20, 2022May 18, 2023Dec 15, 2023Jul 13, 2024Feb 9, 2025Sep 8, 2025Apr 7, 2026
Updated 9d ago

Evaluation Results

MethodLinks
2024.07
96.3---
96.1---
94.3---
92.9---
91.9---
2025.12
89.83---
2022.10
89.8---
2022.10
88.7---
2022.10
88.7---
2026.02
88.52350--
2025.12
88.5---
2022.10
88.3---
2026.03
87.4---
2022.10
87.2---
2022.10
87.1---
2026.03
86.9---
2026.02
86.73301--
2025.12
86.44---
2026.02
85.95361--
2026.03
85.6---
2022.10
85.2---
2026.03
83.7---
2026.02
83.05585--
2025.12
82.8---
2025.12
82.1---
2026.02
81.17793--
2026.02
81.05366--
2025.12
80.34---
2025.12
80---
2024.07
79.7---
2025.12
79.32---
2026.02
79.15642--
2026.03
79---
2026.03
78.9---
2024.07
78.6---
2024.07
78.2---
2025.12
78---
2026.03
77.9---
2025.12
77.6---
2025.12
77.29---
2025.12
77.2---
2026.03
75.8---
2026.02
75.48---
2026.02
75.13---
2025.12
74.9---
2026.02
74.78---
2026.02
74.7---
2026.02
74.18---
2026.02
74.13---
2026.02
74.02353--
2026.02
73.61---
2026.03
73.6---
2026.02
72.91617--
2025.12
72.2---
2025.12
72.2---
2026.03
71.2---
2025.12
70.5---
2025.12
70.17---
2026.02
69.79369--
2026.03
68.3---
2025.12
68.14---
2026.02
67.67785--
2026.02
67.22610--
2025.12
66.44---
2025.12
66.1---
2026.02
64.88628--
2025.12
64.75---
2026.02
64.33769--
2026.02
64.1614--
2026.01
63.7---
2025.12
62.4---
2026.03
61.8---
2026.02
61.53---
2026.02
61.35---
2026.02
60.84---
2026.01
60.6---
2026.02
60.15---
2026.04
60.12---
2026.01
60---
2025.12
59.66---
2026.04
59.64---
2026.02
59.22---
2025.12
59---
2025.12
58---
2025.12
57.6---
2026.04
57.42---
2026.02
57.34---
2025.12
56.6---
2026.04
56.48---
2026.04
56.31---
2025.05
56.19---
2025.12
55.9---
2026.04
54.86---
2025.05
54.84---
2025.05
54.51---
2026.04
54.44---
2026.01
54.4---
2025.05
53.51---
2026.02
53.5---
2026.04
53.33---
Showing 100 of 205 rows