Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Trajectory Verification on SWE-bench Verified Qwen3-Coder-Max trajectories
Loading...
74.6
RM@32
SWE-RM-30A3B
64.616
67.208
69.8
72.392
Dec 26, 2025
RM@32
AUC
ECE
Updated 4d ago
Evaluation Results
Method
Method
Links
RM@32
AUC
ECE
SWE-RM-30A3B
Type=EF (execution-free)
2025.12
74.6
0.768
0.047
DEEP SWE
Type=EB (execution-based)
2025.12
67.6
-
-
DEEP SWE
Type=EF (execution-free)
2025.12
66.2
0.74
0.139
SWE-GYM
Type=EF (execution-free)
2025.12
65.4
0.752
0.283
AGENTLESS
Type=EB (execution-based)
2025.12
65
-
-
Feedback
Search any
task
Search any
task