Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Science Question Answering on ARC-C

96.3Accuracy

GPT-4

59.17268.81178.4588.089Oct 20, 2022May 24, 2023Dec 27, 2023Jul 31, 2024Mar 4, 2025Oct 7, 2025May 12, 2026
Updated 5d ago

Evaluation Results

MethodLinks
2024.07
96.3---
96.1---
94.3---
92.9---
91.9---
2026.05
91.4---
2025.12
89.83---
2022.10
89.8---
2026.05
89.6---
2022.10
88.7---
2022.10
88.7---
2026.02
88.52350--
2025.12
88.5---
2026.05
88.4---
2022.10
88.3---
2026.03
87.4---
2022.10
87.2---
2022.10
87.1---
2026.03
86.9---
2026.05
86.9---
2026.02
86.73301--
2026.05
86.5---
2025.12
86.44---
2026.02
85.95361--
2026.04
85.9---
2026.03
85.6---
2026.04
85.5---
2022.10
85.2---
2026.04
85.2---
2026.05
84.6---
2026.03
83.7---
2026.04
83.7---
2026.04
83.5---
2026.04
83.2---
2026.04
83.1---
2026.02
83.05585--
2025.12
82.8---
2026.04
82.4---
2026.04
82.2---
2025.12
82.1---
2026.02
81.17793--
2026.02
81.05366--
2025.12
80.34---
2025.12
80---
2024.07
79.7---
2026.05
79.4---
2025.12
79.32---
2026.02
79.15642--
2026.05
79.1---
2026.03
79---
2026.03
78.9---
2024.07
78.6---
2026.05
78.3---
2024.07
78.2---
2026.04
78.1---
2025.12
78---
2026.03
77.9---
2026.05
77.8---
2025.12
77.6---
2025.12
77.29---
2025.12
77.2---
2026.05
76.4---
2026.03
75.8---
2026.05
75.6---
2026.02
75.48---
2026.02
75.13---
2025.12
74.9---
2026.02
74.78---
2026.02
74.7---
2026.02
74.18---
2026.02
74.13---
2026.02
74.02353--
2026.02
73.61---
2026.03
73.6---
2026.02
72.91617--
2025.12
72.2---
2025.12
72.2---
2026.03
71.2---
2025.12
70.5---
2025.12
70.17---
2026.02
69.79369--
2026.04
68.4---
2026.03
68.3---
2025.12
68.14---
2026.02
67.67785--
2026.02
67.22610--
2026.05
66.9---
2025.12
66.44---
2025.12
66.1---
2026.02
64.88628--
2025.12
64.75---
2026.02
64.33769--
2026.02
64.1614--
2026.01
63.7---
2025.12
62.4---
2026.03
61.8---
2026.02
61.53---
2026.02
61.35---
2026.02
60.84---
2026.01
60.6---
Showing 100 of 273 rows