Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Common Sense Reasoning on ARC-Challenge (0-shot)

57.2Accuracy

fp16

17.26427.6323848.368Oct 30, 2025Nov 17, 2025Dec 5, 2025Dec 23, 2025Jan 10, 2026Jan 28, 2026Feb 16, 2026
Updated 9d ago

Evaluation Results

MethodLinks
2026.02
57.2
2026.02
56.2
2026.02
54.8
2026.02
54.3
2026.02
48.8
2026.02
47.4
2026.02
46.5
2026.02
46.2
2026.02
45.5
2026.02
45.1
2026.02
43.3
2026.02
42.2
2026.02
40.5
2025.10
40.1
2026.02
39.6
2025.10
39.51
2025.10
39.07
2025.10
38.39
2025.10
37.88
2025.10
37.84
2025.10
36.17
2025.10
35.66
2025.10
35.4
2025.10
35.32
2025.10
35.1
2025.10
33.78
2026.02
29.3
2026.02
28.5
2026.02
24.7
2026.02
24.5
2026.02
18.8