Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Software Engineering on SWE-bench Verified 500 issues (Protocol A)
Loading...
73.6
Success Rate (SR)
OpenHands CodeAct + GBT-SE
33.04
43.57
54.1
64.63
Jan 30, 2026
Success Rate (SR)
Coverage (Cov)
Violations Rate (Viol)
Untestable Success Rate (USucc)
Tokens Used (Tok)
Characters Used (Chars)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Success Rate (SR)
Coverage (Cov)
Violations Rate (Viol)
Untestable Success Rate (USucc)
Tokens Used (Tok)
Characters Used (Chars)
OpenHands CodeAct + GBT-SE
Backbone=gpt-4o, Proto...
2026.01
73.6
86
0.2
0
126
490
OpenHands CodeAct + GBT-Basic
Backbone=gpt-4o, Proto...
2026.01
50.2
84.4
0.4
0.2
148
570
OpenHands CodeAct + Global guardrail only
Backbone=gpt-4o, Proto...
2026.01
38.8
-
0.8
0.2
196
770
OpenHands CodeAct (gpt-4o, native)
Backbone=gpt-4o, Proto...
2026.01
34.6
-
2.8
1.2
208
820
Feedback
Search any
task
Search any
task