Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Trajectory Verification on SWE-bench Verified Qwen3-Coder-Flash trajectories
Loading...
62
RM@32
SWE-RM-30A3B
50.768
53.684
56.6
59.516
Dec 26, 2025
RM@32
AUC
ECE
Updated 4d ago
Evaluation Results
Method
Method
Links
RM@32
AUC
ECE
SWE-RM-30A3B
Type=EF (execution-free)
2025.12
62
0.783
0.051
DEEP SWE
Type=EB (execution-based)
2025.12
54.6
-
-
DEEP SWE
Type=EF (execution-free)
2025.12
53.2
0.758
0.124
AGENTLESS
Type=EB (execution-based)
2025.12
52.6
-
-
SWE-GYM
Type=EF (execution-free)
2025.12
51.2
0.776
0.223
Feedback
Search any
task
Search any
task