Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense Reasoning on CommonsenseQA (test)

90Accuracy

Human-Rater

17.236.15573.9May 19, 2023Nov 10, 2023May 4, 2024Oct 27, 2024Apr 20, 2025Oct 13, 2025Apr 7, 2026
Updated 9d ago

Evaluation Results

MethodLinks
2023.05
90
2026.04
84.9
2026.04
84.8
2026.04
84.6
2026.04
82.4
2026.04
82
2026.04
81.5
2023.05
80
2026.04
79.6
2026.04
79.2
2026.04
78.1
2023.05
77.9
2026.04
76.4
2026.04
76
2023.05
75.7
2026.04
75.5
2023.05
75.2
2026.04
74.3
2026.04
73.6
2026.04
73.6
2023.05
73.5
2026.04
73.5
2026.04
73.3
2023.05
72.6
2026.04
71.4
2023.05
69.3
2025.02
67.63
2025.02
66.79
2025.02
66.51
2025.02
66.44
2025.02
65.04
2025.02
64.32
2025.02
63.17
2025.02
63.04
2025.02
62.91
2025.02
62.59
2025.02
62.48
2025.02
62.47
2025.02
62.28
2025.02
62.23
2025.02
62.21
2025.02
62.13
2025.02
61.71
2025.02
61.32
2025.02
61.22
2025.02
61.09
2025.02
60.81
2025.02
60.65
2025.02
60.21
2025.02
57.5
2025.02
53.97
2025.02
52.65
2026.04
50.8
2025.02
49.6
2026.04
49.6
2026.04
49.5
2025.02
48.47
2025.10
33
2025.10
31
2025.10
30
2025.10
29
2023.05
20