Share your thoughts, 1 month free Claude Pro on usSee more

Code Generation on HumanEval+, MBPP+, and BigCodeBench Aggregate

70.72Average Score

Code-A1

Updated 4mo ago

Evaluation Results

Method	Links
Code-A1 2026.03		70.72
Self-Play 2026.03		70.39
Golden Tests 2026.03		70.37
/ 2026.03		68.35
Code-A1 2026.03		66.15
Golden Tests 2026.03		65.14
Self-Play 2026.03		64.67
/ 2026.03		60.84
Code-A1 2026.03		56.95
Golden Tests 2026.03		56.23
Self-Play 2026.03		55.88
/ 2026.03		51.21