Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on LiveMathBench v202505 (test)
Loading...
17.5
Avg@4
AIPO
4.292
7.721
11.15
14.579
May 8, 2026
Avg@4
Updated 22d ago
Evaluation Results
Method
Method
Links
Avg@4
AIPO
Policy Model=Qwen2.5-7...
2026.05
17.5
LUFFY
Policy Model=Qwen2.5-7...
2026.05
15.1
AIPO
Policy Model=Qwen2.5-7...
2026.05
14.9
OPSD
Policy Model=Qwen2.5-7...
2026.05
14.2
Dr.GRPO
Policy Model=Qwen2.5-7...
2026.05
14
GRPO
Policy Model=Qwen2.5-7...
2026.05
13.9
AIPO
Policy Model=Llama3.2-...
2026.05
13.3
LUFFY
Policy Model=Qwen2.5-7...
2026.05
13.2
OPSD
Policy Model=Qwen2.5-7...
2026.05
12.8
SFT
Policy Model=Qwen2.5-7...
2026.05
12.5
PRIME
Policy Model=Qwen2.5-7...
2026.05
11.5
SFT
Policy Model=Qwen2.5-7...
2026.05
11.1
Original
Policy Model=Qwen2.5-7...
2026.05
10.8
AIPO
Policy Model=Llama3.2-...
2026.05
10.8
LUFFY
Policy Model=Llama3.2-...
2026.05
10.8
OPSD
Policy Model=Llama3.2-...
2026.05
9.5
OPSD
Policy Model=Llama3.2-...
2026.05
7
SFT
Policy Model=Llama3.2-...
2026.05
7
SFT
Policy Model=Llama3.2-...
2026.05
5.1
LUFFY
Policy Model=Llama3.2-...
2026.05
4.8
Feedback
Search any
task
Search any
task