Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on MBPP (Pass@1 Python, Pass@1 Rust)
Loading...
96.5
Pass@1 Accuracy (Python)
SYMPHONY-L
67.172
74.786
82.4
90.014
Jan 30, 2026
Pass@1 Accuracy (Python)
Pass@1 Accuracy (Rust)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1 Accuracy (Python)
Pass@1 Accuracy (Rust)
SYMPHONY-L
Size=Large
2026.01
96.5
97.4
SYMPHONY-S
Size=Small
2026.01
92.7
94.6
AgentCoder
2026.01
91.8
-
MASTER
2026.01
91
-
AgentVerse
2026.01
89
-
MetaGPT
2026.01
87.7
-
LATS
2026.01
81.1
-
GPT-4
2026.01
80
71
Reflexion
2026.01
77.1
75.4
RAP
2026.01
71.4
-
GPT-4
Prompting=ReAct
2026.01
71
-
GPT-4
Prompting=CoT
2026.01
68.3
-
Feedback
Search any
task
Search any
task