Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Commonsense Reasoning on Commonsense Reasoning Benchmarks (HellaSwag, WinoGrande, BoolQ)
Loading...
73.6
HellaSwag
QUEST
67.36
68.98
70.6
72.22
Apr 13, 2026
HellaSwag
WinoGrande
BoolQ
Average Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
HellaSwag
WinoGrande
BoolQ
Average Score
QUEST
Budget=256, Model=Trad...
2026.04
73.6
63.2
75
70.6
LoSA
Budget=128, Model=Trad...
2026.04
70.8
66
72.92
69.91
LoSA
Budget=256, Model=Trad...
2026.04
70.4
65.2
73.96
69.85
Dense
Budget=–, Model=Trado-...
2026.04
69.6
63.6
72.92
68.71
QUEST
Budget=128, Model=Trad...
2026.04
67.6
66.8
73.96
69.45
Feedback
Search any
task
Search any
task