Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on MBPP (Accuracy, AVG, Improvement Overhead)
Loading...
81.2
Accuracy
Qwen3-235B-A22B
54.368
61.334
68.3
75.266
Aug 19, 2025
Accuracy
Average Metric (AVG)
Improvement Overhead
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Average Metric (AVG)
Improvement Overhead
Qwen3-235B-A22B
Framework=Reference Mo...
2025.08
81.2
78.22
-
COCO Qwen3-8B with coco(Qwen3-8B)
Framework=COCO Framewo...
2025.08
76.6
74.18
6.2
COCO Qwen3-8B with coco(Llama-3.1-8B)
Framework=COCO Framewo...
2025.08
74.4
74.37
6.5
Qwen3-8B
Framework=Reference Mo...
2025.08
66
68.52
-
Aflow-Qwen3-8B
Framework=Multi-Agent...
2025.08
65
69.86
-
COCO Llama-3.1-8B with coco(Qwen3-8B)
Framework=COCO Framewo...
2025.08
60.4
63.59
9.5
Llama-3.1-8B
Framework=Reference Mo...
2025.08
60.2
55.48
-
Aflow-Llama3.1-8B
Framework=Multi-Agent...
2025.08
55.8
58.09
-
COCO Llama-3.1-8B with coco(Llama-3.1-8B)
Framework=COCO Framewo...
2025.08
55.4
58.46
0.63
Feedback
Search any
task
Search any
task