Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Common-sense Reasoning Average

73.44Average Accuracy

FP16

53.88858.96464.0469.116Mar 2, 2026Mar 16, 2026Mar 31, 2026Apr 15, 2026Apr 30, 2026May 15, 2026May 30, 2026
Updated 1d ago

Evaluation Results

MethodLinks
2026.05
73.44
2026.05
71.61
2026.05
70.61
2026.05
69.91
2026.05
69.89
2026.05
69.84
2026.05
69.1
2026.05
68.33
2026.05
68.07
2026.05
67.13
2026.05
66.56
2026.05
66.38
2026.05
66.15
2026.05
65.08
2026.05
64.91
2026.05
64.67
2026.05
61.84
2026.05
60.25
2026.05
59.88
2026.05
59.85
2026.05
59.66
2026.05
59.08
2026.05
58.86
2026.03
58.84
2026.03
58.75
2026.03
58.68
2026.03
58.53
2026.03
58.3
2026.03
58.19
2026.03
57.89
2026.03
57.87
2026.03
57.81
2026.03
57.8
2026.05
57.59
2026.03
57.55
2026.05
57.06
2026.05
56.64
2026.05
54.78
2026.05
54.64