Share your thoughts, 1 month free Claude Pro on usSee more

Code Generation on MBPP (Pass@1 Python, Pass@1 Rust)

96.5Pass@1 Accuracy (Python)

SYMPHONY-L

Updated 4mo ago

Evaluation Results

Method	Links
SYMPHONY-L 2026.01		96.5	97.4
SYMPHONY-S 2026.01		92.7	94.6
AgentCoder 2026.01		91.8	-
MASTER 2026.01		91	-
AgentVerse 2026.01		89	-
MetaGPT 2026.01		87.7	-
LATS 2026.01		81.1	-
GPT-4 2026.01		80	71
Reflexion 2026.01		77.1	75.4
RAP 2026.01		71.4	-
GPT-4 2026.01		71	-
GPT-4 2026.01		68.3	-