Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on MBPP OOD Generalization (trained on CodeContests)
Loading...
85.45
Pass@128
Base (No-train)
73.6148
76.6874
79.76
82.8326
Mar 17, 2026
Pass@128
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@128
Base (No-train)
Backbone=Qwen2.5-32B
2026.03
85.45
LEAFE
Backbone=Qwen2.5-32B
2026.03
85.45
LEAFE
Backbone=Qwen2.5-72B
2026.03
85.13
Base (No-train)
Backbone=Qwen2.5-72B
2026.03
83.33
GRPO
Backbone=Qwen2.5-32B
2026.03
81.22
GRPO
Backbone=Qwen2.5-72B
2026.03
81.22
LEAFE
Backbone=Llama3-70B
2026.03
79.63
Base (No-train)
Backbone=Llama3-70B
2026.03
78.31
GRPO
Backbone=Llama3-70B
2026.03
74.07
Feedback
Search any
task
Search any
task