Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME 2024 (Reward-weighted Pass@1)
Loading...
3.45
Reward-weighted Pass@1
SFT
0.7564
1.4557
2.155
2.8543
Oct 2, 2025
Reward-weighted Pass@1
Updated 1mo ago
Evaluation Results
Method
Method
Links
Reward-weighted Pass@1
SFT
Model=Qwen2.5-7B
2025.10
3.45
AIRL (Sparse)
Model=Qwen2.5-7B
2025.10
3.43
AIRL (Step-wise)
Model=Qwen2.5-7B
2025.10
3.41
AIRL (Interval)
Model=Qwen2.5-7B
2025.10
3.31
AIRL (Dense)
Model=Qwen2.5-7B
2025.10
3
SFT
Model=Qwen2.5-3B
2025.10
2.59
AIRL (Sparse)
Model=Qwen2.5-3B
2025.10
1.21
AIRL (Dense)
Model=Qwen2.5-3B
2025.10
1.11
AIRL (Step-wise)
Model=Qwen2.5-3B
2025.10
1.05
AIRL (Interval)
Model=Qwen2.5-3B
2025.10
0.86
Feedback
Search any
task
Search any
task