Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on MBPP (Test accuracy %)
Loading...
42.4
Test Accuracy (%)
StableDRL
29.92
33.16
36.4
39.64
May 12, 2026
Test Accuracy (%)
Updated 21d ago
Evaluation Results
Method
Method
Links
Test Accuracy (%)
StableDRL
Generation Length=512
2026.05
42.4
wd1
Generation Length=512
2026.05
41.8
GDPO
Generation Length=512
2026.05
41.6
SPG
Generation Length=512
2026.05
41.4
SFT
Generation Length=512
2026.05
41
Diffu-GRPO
Generation Length=512
2026.05
40.8
d1
Generation Length=512
2026.05
40.8
ESPO
Generation Length=512
2026.05
40.6
LLaDA-8B-Instruct
Generation Length=512
2026.05
40.4
AdaBlock-dLLM
Generation Length=512
2026.05
39.6
GDPO
Generation Length=256
2026.05
35
StableDRL
Generation Length=256
2026.05
34.8
d1
Generation Length=256
2026.05
34.4
SPG
Generation Length=256
2026.05
34.2
AdaBlock-dLLM
Generation Length=256
2026.05
34
Diffu-GRPO
Generation Length=256
2026.05
34
wd1
Generation Length=256
2026.05
34
ESPO
Generation Length=256
2026.05
33.8
LLaDA-8B-Instruct
Generation Length=256
2026.05
33.6
SFT
Generation Length=256
2026.05
30.4
Feedback
Search any
task
Search any
task