Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code on MBPP (Score)
Loading...
93.77
Score
Qwen3.5-9B + AR-SFT
51.6916
62.6158
73.54
84.4642
Jun 1, 2026
Score
Updated 20h ago
Evaluation Results
Method
Method
Links
Score
Qwen3.5-9B + AR-SFT
Model Size=9B, Trainin...
2026.06
93.77
Qwen3.5-4B + AR-SFT
Model Size=4B, Trainin...
2026.06
91.83
FLARE-9B
Model Size=9B, Trainin...
2026.06
91.05
FLARE-4B
Model Size=4B, Trainin...
2026.06
89.11
Qwen3.5-9B (released)
Model Size=9B, Trainin...
2026.06
89.11
Qwen3.5-4B (released)
Model Size=4B, Trainin...
2026.06
82.49
Qwen3.5-2B + AR-SFT
Model Size=2B, Trainin...
2026.06
71.6
FLARE-2B
Model Size=2B, Trainin...
2026.06
68.09
Qwen3.5-2B (released)
Model Size=2B, Trainin...
2026.06
53.31
Feedback
Search any
task
Search any
task