Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Trajectory Verification on SWE-bench OpenHands-LM-32B trajectories (Verified)
Loading...
48.8
RM@32
SWE-RM-30A3B
41.312
43.256
45.2
47.144
Dec 26, 2025
RM@32
AUC
ECE
Updated 4d ago
Evaluation Results
Method
Method
Links
RM@32
AUC
ECE
SWE-RM-30A3B
Type=EF (execution-free)
2025.12
48.8
0.748
0.08
DEEP SWE
Type=EF (execution-free)
2025.12
44.6
0.732
0.118
DEEP SWE
Type=EB (execution-based)
2025.12
44.2
-
-
AGENTLESS
Type=EB (execution-based)
2025.12
42.4
-
-
SWE-GYM
Type=EF (execution-free)
2025.12
41.6
0.718
0.164
Feedback
Search any
task
Search any
task