Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense Reasoning on Reasoning Suite Zero-shot Aggregate

73.2Aggregate Score

Llama-2-13B

35.44845.24955.0564.851Mar 17, 2025May 10, 2025Jul 4, 2025Aug 28, 2025Oct 21, 2025Dec 15, 2025Feb 8, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2025.03
73.2
2025.03
73.1
2025.03
72.2
2025.03
71.8
2025.03
71.4
2025.03
70.8
2025.03
70.5
2025.03
69.6
2025.03
69
2025.03
68.4
2025.03
67.9
2025.03
67.7
2025.03
67.6
2025.03
67.3
2025.03
66.2
2025.03
66.2
64.7
2026.02
64.56
2025.03
64.5
2025.03
64.1
2026.02
63.93
2026.02
63.49
2025.03
63
2025.03
61.8
2025.03
61.5
59.7
2026.02
58.78
2026.02
57.62
2025.03
57.2
2025.03
54.2
2026.02
53.4
2026.02
53.27
2026.02
52.89
2026.02
49.59
2026.02
48.25
2026.02
48.25
2025.03
47.6
2025.03
46.6
2025.03
46.4
2026.02
42.86
2025.03
42.4
2025.03
42.1
2025.03
41.4
2026.02
39.5
2025.03
36.9