Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Code Generation on BigCodeBench Instruct Full (train)

83.3Last SR

APEX-EM + Opus Judge (A5, E10)

46.48456.04265.675.158Mar 31, 2026
Updated 18d ago

Evaluation Results

MethodLinks
2026.03
83.384
2026.03
81.182.7
2026.03
81.181.5
2026.03
59.562.7
2026.03
57.860.2
2026.03
53.9-
2026.03
5357.7
2026.03
5055.8
2026.03
48.5-
2026.03
47.953
2026.03
-57.7
2026.03
-58.2