Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Software Engineering on SWE-Bench Verified (Reproduced/Reported)
Loading...
23.6
Reproduced Success Rate
ProRL Agent-14B (RL)
9.04
12.82
16.6
20.38
Mar 19, 2026
Reproduced Success Rate
Reported Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Reproduced Success Rate
Reported Success Rate
ProRL Agent-14B (RL)
Size=14B, Thinking mod...
2026.03
23.6
-
ProRL Agent-4B (RL)
Size=4B
2026.03
21.2
-
ProRL Agent-8B (RL)
Size=8B, Thinking mode...
2026.03
18
-
Qwen3-14B
Size=14B, Thinking mod...
2026.03
15.4
-
Qwen3-4B-Instruct-2507
Size=4B
2026.03
14.8
-
Qwen3-8B
Size=8B, Thinking mode...
2026.03
9.6
-
SkyRL-Agent-8B-v0
Size=8B
2026.03
-
9.4
SkyRL-Agent-14B-v0
Size=14B
2026.03
-
21.6
Feedback
Search any
task
Search any
task