Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Commonsense Question Answering on CSQA (OOD)
Loading...
63.8
Accuracy
R1 Distill -> GRPO
46.016
50.633
55.25
59.867
Dec 3, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
R1 Distill -> GRPO
training=SFT + GRPO
2025.12
63.8
BOLT -> GRPO
training=SFT + GRPO
2025.12
62.8
RL-Only
training=RL
2025.12
62.6
SkillFactory -> GRPO
training=SFT + GRPO
2025.12
60.8
STaR -> GRPO
training=SFT + GRPO
2025.12
60.5
R1 Distill
training=SFT
2025.12
56.6
Qwen2.5 1.5B Instruct
training=None
2025.12
55.7
STaR
training=SFT
2025.12
55.4
SkillFactory
training=SFT
2025.12
47.1
BOLT
training=SFT
2025.12
46.7
Feedback
Search any
task
Search any
task