Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning on OOD Reasoning Benchmarks Average
Loading...
61.8
Average Score
Qwen2.5-32B-Instruct + Bootcamp-SFT-RL
41.52
46.785
52.05
57.315
Aug 12, 2025
Average Score
Updated 13d ago
Evaluation Results
Method
Method
Links
Average Score
Qwen2.5-32B-Instruct + Bootcamp-SFT-RL
Model=Qwen2.5-32B-Inst...
2025.08
61.8
DS-R1-Distilled-Qwen-32B + Bootcamp-RL
Model=DS-R1-Distilled-...
2025.08
56.9
Qwen2.5-32B-Instruct + Bootcamp-SFT
Model=Qwen2.5-32B-Inst...
2025.08
53.2
DS-R1-Distilled-Qwen-32B
Model=DS-R1-Distilled-...
2025.08
52.5
Qwen2.5-32B-Instruct + Bootcamp-RL
Model=Qwen2.5-32B-Inst...
2025.08
43
Qwen2.5-32B-Instruct
Model=Qwen2.5-32B-Inst...
2025.08
42.3
Feedback
Search any
task
Search any
task