Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Commonsense Reasoning on Reasoning Suite Zero-shot Aggregate

73.2Aggregate Score

Llama-2-13B

35.44845.24955.0564.851Mar 17, 2025May 10, 2025Jul 4, 2025Aug 28, 2025Oct 21, 2025Dec 15, 2025Feb 8, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2025.03
73.2
2025.03
73.1
2025.03
72.2
2025.03
71.8
2025.03
71.4
2025.03
70.8
2025.03
70.5
2025.03
69.6
2025.03
69
2025.03
68.4
2025.03
67.9
2025.03
67.7
2025.03
67.6
2025.03
67.3
2025.03
66.2
2025.03
66.2
64.7
2026.02
64.56
2025.03
64.5
2025.03
64.1
2026.02
63.93
2026.02
63.49
2025.03
63
2025.03
61.8
2025.03
61.5
59.7
2026.02
58.78
2026.02
57.62
2025.03
57.2
2025.03
54.2
2026.02
53.4
2026.02
53.27
2026.02
52.89
2026.02
49.59
2026.02
48.25
2026.02
48.25
2025.03
47.6
2025.03
46.6
2025.03
46.4
2026.02
42.86
2025.03
42.4
2025.03
42.1
2025.03
41.4
2026.02
39.5
2025.03
36.9