Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Question Answering on ARC Challenge (val)

93.3Accuracy

Pioneer Agent (Qwen3-8B)

1.7825.5449.373.06Apr 24, 2020Apr 21, 2021Apr 19, 2022Apr 17, 2023Apr 14, 2024Apr 12, 2025Apr 10, 2026
Updated 5d ago

Evaluation Results

MethodLinks
2026.04
93.31.6
2026.04
91.7-
2026.02
89.6-
2026.02
89.6-
2026.02
88.3-
2026.02
73.9-
2026.02
72.6-
2026.04
72.667.3
2026.02
72.2-
2020.04
51.5-
2020.04
50.8-
2020.04
50.8-
2024.05
50.34-
2024.05
49.91-
2020.04
49.5-
2024.05
49.15-
2024.05
49.06-
2024.05
48.29-
2024.05
47.87-
2024.05
47.78-
2024.05
47.53-
2024.05
46.76-
2024.05
46.67-
2024.05
46.42-
2024.05
45.99-
2024.05
45.99-
2024.05
45.65-
2024.05
45.14-
2024.05
44.97-
2024.05
44.11-
2024.05
44.03-
2020.04
43.5-
2024.05
43.43-
2024.05
43.34-
2024.05
43.17-
43.1-
2024.05
42.75-
2024.05
42.32-
2024.05
41.72-
2024.05
41.55-
2024.05
41.47-
2024.05
41.38-
2024.05
39.93-
2024.05
39.59-
2024.05
38.74-
2024.05
38.74-
2024.05
38.57-
2024.05
38.48-
2024.05
37.12-
2024.05
35.32-
2024.05
34.64-
2024.05
34.47-
2024.01
34.3-
2024.05
34.3-
2024.05
33.79-
2024.05
33.79-
2024.05
33.28-
2024.05
33.19-
2024.05
32.59-
2024.05
32-
2024.01
31.6-
2024.05
31.48-
2024.01
30.2-
2024.05
30.2-
2024.05
30.2-
2024.05
29.69-
2024.01
28.2-
2024.01
26-
2024.05
24.66-
2024.05
22.44-
2024.05
21.76-
2024.05
21.67-
2024.05
21.33-
2024.05
20.65-
2024.05
19.88-
2026.04
5.3-