Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Software Engineering Issue Resolution on SWE-Bench Multilingual
Loading...
45.78
Resolve Rate
Nemotron 3 Super
-1.6128
10.6911
22.995
35.2989
Dec 26, 2025
Jan 13, 2026
Jan 31, 2026
Feb 18, 2026
Mar 8, 2026
Mar 26, 2026
Apr 14, 2026
Resolve Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Resolve Rate
Nemotron 3 Super
Framework/Scaffold=Ope...
2026.04
45.78
GPT-OSS-120B
Framework/Scaffold=Ope...
2026.04
30.8
Hybrid
Feedback=Hybrid (Execu...
2025.12
0.357
Execution-based only
Feedback=Execution-based
2025.12
0.333
Execution-free only
Feedback=Execution-fre...
2025.12
0.33
Poor Calibrated RM
Feedback=Ablated RM (P...
2025.12
0.21
Feedback
Search any
task
Search any
task