Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Commonsense Reasoning on HellaSwag (first 1000 examples)
Loading...
38
Accuracy (HellaSwag 1k)
k=64 (DR only)
35.608
36.229
36.85
37.471
Apr 20, 2026
Accuracy (HellaSwag 1k)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy (HellaSwag 1k)
k=64 (DR only)
Model size=300M, Token...
2026.04
38
k=64 + sink
Model size=300M, Token...
2026.04
36.5
k=0 (baseline)
Model size=300M, Token...
2026.04
35.7
Feedback
Search any
task
Search any
task