Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense Reasoning on Reasoning Suite Zero-shot Aggregate

73.2Aggregate Score

Llama-2-13B

35.44845.24955.0564.851Mar 17, 2025May 27, 2025Aug 7, 2025Oct 18, 2025Dec 28, 2025Mar 10, 2026May 21, 2026
Updated 9d ago

Evaluation Results

MethodLinks
2025.03
73.2
2025.03
73.1
2025.03
72.2
2025.03
71.8
2025.03
71.4
2025.03
70.8
2025.03
70.5
2025.03
69.6
2025.03
69
2025.03
68.4
2025.03
67.9
2025.03
67.7
2025.03
67.6
2025.03
67.3
2025.03
66.2
2025.03
66.2
2026.05
66.01
64.7
2026.02
64.56
2025.03
64.5
2025.03
64.1
2026.02
63.93
2026.05
63.89
2026.02
63.49
2025.03
63
2025.03
61.8
2025.03
61.5
59.7
2026.05
59.49
2026.02
58.78
2026.02
57.62
2026.05
57.24
2025.03
57.2
2025.03
54.2
2026.05
54.06
2026.02
53.4
2026.02
53.27
2026.02
52.89
2026.02
49.59
2026.02
48.25
2026.02
48.25
2025.03
47.6
2025.03
46.6
2025.03
46.4
2026.02
42.86
2025.03
42.4
2025.03
42.1
2025.03
41.4
2026.02
39.5
2025.03
36.9