Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Software Engineering Issue Resolution on SWE-Bench Verified 100-instance
Loading...
68
Pass@1
ContextWeaver
18.9848
31.7099
44.435
57.1601
Apr 24, 2026
Apr 27, 2026
Apr 30, 2026
May 4, 2026
May 7, 2026
May 10, 2026
May 14, 2026
Pass@1
Pass@5
Average Steps
Percent Instances with Fewer Steps
Updated 19d ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@5
Average Steps
Percent Instances with Fewer Steps
ContextWeaver
Backbone=Claude Sonnet...
2026.04
68
81
55.8
73
Sliding Window
Backbone=Claude Sonnet...
2026.04
67.2
78
59.2
27
InsightReplay
Model=Qwen3.5-35B-A3B,...
2026.05
29.75
-
-
-
CoT Zero-shot
Model=Qwen3.5-35B-A3B,...
2026.05
23.12
-
-
-
CoT Few-shot
Model=Qwen3.5-35B-A3B,...
2026.05
20.87
-
-
-
Feedback
Search any
task
Search any
task