Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Code Generation on MultiPL-E MBPP
Loading...
58.8
Score
Kimi-K2
50.272
52.486
54.7
56.914
Jan 6, 2026
Jan 12, 2026
Jan 18, 2026
Jan 24, 2026
Jan 30, 2026
Feb 5, 2026
Feb 11, 2026
Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Score
Kimi-K2
Model Variant=Base, #...
2026.02
58.8
Kimi-K2 Base
# Shots=0-shot, # Acti...
2026.01
58.8
Step 3.5 Flash
Model Variant=Base, #...
2026.02
58
MiMo-V2 Flash
Model Variant=Base, #...
2026.02
56.7
MiMo-V2-Flash Base
# Shots=0-shot, # Acti...
2026.01
56.7
DeepSeek V3.1
Model Variant=Base, #...
2026.02
52.5
DeepSeek-V3.1 Base
# Shots=0-shot, # Acti...
2026.01
52.5
DeepSeek V3.2
Model Variant=Exp Base...
2026.02
50.6
DeepSeek-V3.2 Exp Base
# Shots=0-shot, # Acti...
2026.01
50.6
Feedback
Search any
task
Search any
task