Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Feasibility Prediction on SWE-bench
Loading...
47.6
F1@1
Qwen3 235B
31.584
35.742
39.9
44.058
May 29, 2026
F1@1
F1@All
Fail F1
Updated 1d ago
Evaluation Results
Method
Method
Links
F1@1
F1@All
Fail F1
Qwen3 235B
2026.05
47.6
35.1
32.8
GPT-5.2 Instant
2026.05
43.5
40.2
21.2
Claude Opus 4.7
2026.05
41.1
51.1
48.8
Gemini 3.1 Pro Preview
2026.05
39.2
58.2
52
Claude Sonnet 4.6
2026.05
32.2
37.7
23.4
Feedback
Search any
task
Search any
task