Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Software Engineering on SWE-Bench Verified (SWE-Agent Metrics)
Loading...
78.2
SWE-Agent Score
Claude-Opus-4.5
69.88
72.04
74.2
76.36
Feb 28, 2026
SWE-Agent Score
MiniSWE-Agent Score
OpenHands Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
SWE-Agent Score
MiniSWE-Agent Score
OpenHands Score
Claude-Opus-4.5
Size=?, Maximum agent...
2026.02
78.2
77.8
79
Claude-Sonnet-4.5
Size=?, Maximum agent...
2026.02
76
68.4
74.6
MiniMax-M2.1
Size=230A10, Maximum a...
2026.02
74.8
70.4
71
GLM-4.7
Size=358A32, Maximum a...
2026.02
74.2
70.4
70.6
Kimi-K2.5
Size=1000A32, Maximum...
2026.02
73.2
70.8
-
Qwen3-Coder-Next
Size=80A3, Maximum age...
2026.02
70.6
71.1
71.3
DeepSeek-V3.2
Size=671A37, Maximum a...
2026.02
70.2
67.2
72.6
Feedback
Search any
task
Search any
task