Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Code Generation on MBPP (% Avg@4)
Loading...
77.5
Avg@4 (%)
GRPO + RePro
57.428
62.639
67.85
73.061
Dec 1, 2025
Avg@4 (%)
Updated 4d ago
Evaluation Results
Method
Method
Links
Avg@4 (%)
GRPO + RePro
Backbone=Hunyuan-1.8B-...
2025.12
77.5
RF++ B + RePro
Backbone=Hunyuan-1.8B-...
2025.12
77
RF++ B
Backbone=Hunyuan-1.8B-...
2025.12
76.3
PPO + RePro
Backbone=Hunyuan-1.8B-...
2025.12
76.2
GRPO
Backbone=Hunyuan-1.8B-...
2025.12
75.9
PPO
Backbone=Hunyuan-1.8B-...
2025.12
75.8
Original
Backbone=Hunyuan-1.8B-...
2025.12
73.1
RF++ B + RePro
Backbone=Qwen3-1.7B
2025.12
71.7
PPO + RePro
Backbone=Qwen3-1.7B
2025.12
69.1
GRPO + RePro
Backbone=Qwen3-1.7B
2025.12
68.8
PPO
Backbone=Qwen3-1.7B
2025.12
68.2
GRPO
Backbone=Qwen3-1.7B
2025.12
67.5
Original
Backbone=Qwen3-1.7B
2025.12
66.9
RF++ B
Backbone=Qwen3-1.7B
2025.12
66.1
PPO + RePro
Backbone=MobileLLM-R1-...
2025.12
65.1
PPO
Backbone=MobileLLM-R1-...
2025.12
63.6
Original
Backbone=MobileLLM-R1-...
2025.12
58.2
Feedback
Search any
task
Search any
task