Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense Reasoning on CommonsenseQA (pass@1 accuracy)

86.6Accuracy (pass@1)

DENOISE

35.53648.79362.0575.307Mar 13, 2024Jul 17, 2024Nov 21, 2024Mar 28, 2025Aug 1, 2025Dec 6, 2025Apr 12, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.04
86.6
2026.04
86.2
2026.04
86
2026.04
86
2025.10
85.34
2026.04
84.8
2026.04
84
2025.10
83.95
2025.10
83.95
2026.04
83.8
2025.10
83.7
2026.04
83
2026.04
82.8
2026.04
82.4
2026.04
82
2026.04
81.8
2026.04
81.8
2026.04
81.8
2026.04
81.4
2026.04
81
2026.04
81
2026.04
80.8
2024.03
73.81
2024.03
71.33
2024.03
68.96
2026.04
68.9
2026.04
63.4
2026.04
55.6
2026.04
53.2
2026.04
52.22
2026.04
51.6
2026.04
49.6
2026.04
47.2
2026.04
44.8
2026.04
44.4
2026.04
43.4
2026.04
42.5
2026.04
42.3
2026.04
42.07
41.66
2026.04
40.67
2026.04
40.43
2026.04
40.4
2026.04
40
2026.04
37.5