Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on MBPP+ (avg@32)
Loading...
74.5
Average Success Rate (@32)
Code-A1
60.3248
64.0049
67.685
71.3651
Mar 16, 2026
Average Success Rate (@32)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Success Rate (@32)
Code-A1
Code LLM=Qwen2.5-Coder...
2026.03
74.5
Self-Play
Code LLM=Qwen2.5-Coder...
2026.03
74.23
Golden Tests
Code LLM=Qwen2.5-Coder...
2026.03
74.16
/
Code LLM=Qwen2.5-Coder...
2026.03
71.95
Code-A1
Code LLM=Qwen2.5-Coder...
2026.03
69.07
Golden Tests
Code LLM=Qwen2.5-Coder...
2026.03
68.05
Self-Play
Code LLM=Qwen2.5-Coder...
2026.03
67.06
Self-Play
Code LLM=Qwen2.5-Coder...
2026.03
63.54
Code-A1
Code LLM=Qwen2.5-Coder...
2026.03
63.33
Golden Tests
Code LLM=Qwen2.5-Coder...
2026.03
63.3
/
Code LLM=Qwen2.5-Coder...
2026.03
63.12
/
Code LLM=Qwen2.5-Coder...
2026.03
60.87
Feedback
Search any
task
Search any
task