Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Coding on SWE-Bench Multilingual
Loading...
71.7
Accuracy
MiMo-V2-Flash
29.684
40.592
51.5
62.408
Jan 6, 2026
Jan 29, 2026
Feb 21, 2026
Mar 17, 2026
Apr 9, 2026
May 2, 2026
May 26, 2026
Accuracy
Updated 6d ago
Evaluation Results
Method
Method
Links
Accuracy
MiMo-V2-Flash
Variant=Flash
2026.01
71.7
DeepSeek-V3.2 Thinking
Thinking Mode=true
2026.01
70.2
Claude Sonnet 4.5
Variant=Sonnet 4.5
2026.01
68
Qwen3.6
# Total Params=35B, #...
2026.05
67.2
Kimi-K2 Thinking
Thinking Mode=true
2026.01
61.1
Qwen3.5
# Total Params=35B, #...
2026.05
60.3
LAGUNA XS.2
# Total Params=33.4B,...
2026.05
57.7
Devstral Small 2
# Total Params=24B, #...
2026.05
55.7
GPT-5 High
Variant=High
2026.01
55.3
Gemma 4
# Total Params=31B, #...
2026.05
51.7
LongCat-Flash-Lite
Architecture=MoE + NE,...
2026.01
38.1
Kimi-Linear-48B-A3B
Architecture=MoE, # To...
2026.01
37.2
Qwen3-Next-80B-A3B-Instruct
Architecture=MoE, # To...
2026.01
31.3
Feedback
Search any
task
Search any
task