Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME 2025 (Reward-weighted Pass@1)
Loading...
4.3
Reward-weighted Pass@1
AIRL (Interval)
0.2024
1.2662
2.33
3.3938
Oct 2, 2025
Reward-weighted Pass@1
Updated 1mo ago
Evaluation Results
Method
Method
Links
Reward-weighted Pass@1
AIRL (Interval)
Model=Qwen2.5-7B
2025.10
4.3
AIRL (Sparse)
Model=Qwen2.5-7B
2025.10
3.82
SFT
Model=Qwen2.5-7B
2025.10
3.12
AIRL (Step-wise)
Model=Qwen2.5-7B
2025.10
2.82
AIRL (Dense)
Model=Qwen2.5-7B
2025.10
2.17
AIRL (Sparse)
Model=Qwen2.5-3B
2025.10
1.67
SFT
Model=Qwen2.5-3B
2025.10
1.42
AIRL (Step-wise)
Model=Qwen2.5-3B
2025.10
0.84
AIRL (Interval)
Model=Qwen2.5-3B
2025.10
0.36
AIRL (Dense)
Model=Qwen2.5-3B
2025.10
0.36
Feedback
Search any
task
Search any
task