Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Average 7 Commonsense Reasoning Tasks

72.04Avg Accuracy

w/o pruning

28.318439.669251.0262.3708Feb 18, 2025Apr 27, 2025Jul 4, 2025Sep 10, 2025Nov 17, 2025Jan 24, 2026Apr 2, 2026
Updated 15d ago

Evaluation Results

MethodLinks
2025.02
72.04
2025.02
71.73
2025.02
70.87
2025.02
70.81
2025.02
70.58
2025.02
70.49
2025.02
70.05
2025.02
69.14
2025.02
68.62
2025.02
68.24
2025.02
67.95
2025.02
67.95
2025.02
67.58
2025.02
67.42
2025.02
67.24
2025.02
67.08
2025.02
66.93
2025.02
66.82
2025.02
66.48
2025.02
66.37
2025.02
65.81
2025.02
65.73
2025.02
65.34
2025.02
65.21
2025.02
65.13
2025.02
64.83
2025.02
64.78
2025.02
64.29
2025.02
64.28
2025.02
64.25
2025.02
64.25
2025.02
63.92
2025.02
63.85
2025.02
63.75
2025.02
63.75
2025.02
63.63
2025.02
62.98
2025.02
62.71
2025.02
62.3
2025.02
62.1
2025.02
61.97
2025.02
61.56
2025.02
60.98
2025.02
60.97
2025.02
60.57
2025.02
60.19
2026.04
60
2026.04
60
2025.02
59.45
2026.04
58
2026.04
55
2026.04
53
2026.04
53
2026.04
50
2026.04
50
2026.04
48
2026.04
48
2026.04
47
2026.04
46
2026.04
44
2026.04
44
2026.04
44
2026.04
43
2026.04
41
2026.04
39
2026.04
38
2026.04
35
2026.04
35
2026.04
33
2026.04
32
2026.04
32
2026.04
30