Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Puzzle on Reasoning Gym
Loading...
43.98
Avg@4
AIPO
1.704
12.6795
23.655
34.6305
May 8, 2026
Avg@4
Updated 22d ago
Evaluation Results
Method
Method
Links
Avg@4
AIPO
Model=Qwen2.5-32B-Inst...
2026.05
43.98
GRPO
Model=Qwen2.5-32B-Inst...
2026.05
34.24
SFT
Model=Qwen2.5-32B-Inst...
2026.05
31.13
Original
Model=Qwen2.5-32B-Inst...
2026.05
28.56
AIPO
Student LLM=DeepSeek-R...
2026.05
16.44
GRPO
Student LLM=DeepSeek-R...
2026.05
13.53
SFT
Student LLM=DeepSeek-R...
2026.05
10.98
Original
Student LLM=DeepSeek-R...
2026.05
3.33
Feedback
Search any
task
Search any
task