Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Commonsense Reasoning on Commonsense Reasoning Suite (OBQA, HellaSwag, ARC-E, WSC, Winogrande, BoolQ, PIQA)
Loading...
49.4
Average Accuracy
MUON+
48.048
48.399
48.75
49.101
Feb 25, 2026
Average Accuracy
OBQA Accuracy
HellaSwag Accuracy
ARC-E Accuracy
WSC Accuracy
Winogrande Accuracy
BoolQ Accuracy
PIQA Accuracy
Updated 19d ago
Evaluation Results
Method
Method
Links
Average Accuracy
OBQA Accuracy
HellaSwag Accuracy
ARC-E Accuracy
WSC Accuracy
Winogrande Accuracy
BoolQ Accuracy
PIQA Accuracy
MUON+
Backbone=GPT-Base, Che...
2026.02
49.4
32
48
47.1
36.5
53.4
57.7
71.2
Muon
Backbone=GPT-Base, Che...
2026.02
48.1
30.6
44.6
44.4
37.5
52.2
57.4
70
Feedback
Search any
task
Search any
task