Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on MultiPL-E MBPP
Loading...
58.8
Score
Kimi-K2
50.272
52.486
54.7
56.914
Jan 6, 2026
Jan 12, 2026
Jan 18, 2026
Jan 24, 2026
Jan 30, 2026
Feb 5, 2026
Feb 11, 2026
Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Score
Kimi-K2
Model Variant=Base, #...
2026.02
58.8
Kimi-K2 Base
# Shots=0-shot, # Acti...
2026.01
58.8
Step 3.5 Flash
Model Variant=Base, #...
2026.02
58
MiMo-V2 Flash
Model Variant=Base, #...
2026.02
56.7
MiMo-V2-Flash Base
# Shots=0-shot, # Acti...
2026.01
56.7
DeepSeek V3.1
Model Variant=Base, #...
2026.02
52.5
DeepSeek-V3.1 Base
# Shots=0-shot, # Acti...
2026.01
52.5
DeepSeek V3.2
Model Variant=Exp Base...
2026.02
50.6
DeepSeek-V3.2 Exp Base
# Shots=0-shot, # Acti...
2026.01
50.6
Feedback
Search any
task
Search any
task