Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense Reasoning on Commonsense Reasoning Suite (OBQA, HellaSwag, ARC-E, WSC, Winogrande, BoolQ, PIQA)

49.4Average Accuracy

MUON+

48.04848.39948.7549.101Feb 25, 2026
Updated 19d ago

Evaluation Results

MethodLinks
2026.02
49.4324847.136.553.457.771.2
2026.02
48.130.644.644.437.552.257.470