Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Software Engineering on SWE Verified HIGH reasoning
Loading...
60.7
Accuracy
gpt-oss-20b
60.388
60.469
60.55
60.631
Apr 1, 2026
Accuracy
Accuracy 95% CI Lower Bound
Updated 17d ago
Evaluation Results
Method
Method
Links
Accuracy
Accuracy 95% CI Lower Bound
gpt-oss-20b
Evaluation Harness=Ope...
2026.04
60.7
-
HarmonyAgent
Evaluation Harness=Har...
2026.04
60.4
56.2
Feedback
Search any
task
Search any
task