Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Software Task Solving on SWE-bench verified
Loading...
1.64
Succ/Mtok (All)
AHE
1.0784
1.2242
1.37
1.5158
Apr 28, 2026
Succ/Mtok (All)
Succ/Mtok (django)
Succ/Mtok (sympy)
Succ/Mtok (sphinx-doc)
Succ/Mtok (matplotlib)
Succ/Mtok (scikit-learn)
Succ/Mtok (pydata)
Succ/Mtok (astropy)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Succ/Mtok (All)
Succ/Mtok (django)
Succ/Mtok (sympy)
Succ/Mtok (sphinx-doc)
Succ/Mtok (matplotlib)
Succ/Mtok (scikit-learn)
Succ/Mtok (pydata)
Succ/Mtok (astropy)
AHE
2026.04
1.64
1.67
1.48
1.07
1.88
3.4
2.15
1.81
NexAU0
2026.04
1.43
1.5
1.43
0.93
1.51
3.06
2
0.82
TF-GRPO
2026.04
1.27
1.35
1.19
0.78
1.33
2.48
1.5
1.26
ACE
2026.04
1.1
1.12
1.15
0.62
1.14
2.08
1.37
1.08
Feedback
Search any
task
Search any
task