Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Commonsense Reasoning on HellaSwag (Acc %)
Loading...
70.16
HellaSwag Accuracy
Before RL
38.44
46.675
54.91
63.145
May 12, 2026
HellaSwag Accuracy
Updated 21d ago
Evaluation Results
Method
Method
Links
HellaSwag Accuracy
Before RL
Model=Qwen3-30B-A3B
2026.05
70.16
Vanilla RL
Model=Qwen3-30B-A3B
2026.05
69.53
RL+SAE
Model=Qwen3-30B-A3B
2026.05
69.21
RL+SAE
Model=Qwen3-8B
2026.05
61.97
Before RL
Model=Qwen3-8B
2026.05
61.02
Vanilla RL
Model=Qwen3-8B
2026.05
60.85
RL+SAE
Model=Qwen3-1.7B
2026.05
40.93
Vanilla RL
Model=Qwen3-1.7B
2026.05
39.99
Before RL
Model=Qwen3-1.7B
2026.05
39.66
Feedback
Search any
task
Search any
task