Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Software Engineering benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Software Engineering
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
SWE-bench Verified
JoyAI-LLM Flash
Accuracy
62.6
33
13d ago
SWE-bench Verified
OpenHands
Success Rate
71.8
29
4d ago
SWE-bench Verified
MemCoder
Resolution Rate
83.8
26
18d ago
SWE-Bench Multilingual 1.0 (test)
Claude-Opus-4.5
Resolution Rate
75.2
20
1mo ago
SWE-Bench Verified
Mini-SWE
Pass@1
84
18
3d ago
SWE Verified
Claude-4.5-Sonnet
Resolution Rate
77.2
17
1mo ago
SWE-Bench Pro 1.0 (test)
Claude-Opus-4.5
Resolved Rate
51.6
14
1mo ago
SWE-bench Lite
DFlash+DDTree
Speedup
4.38
12
3d ago
SWE-bench
Qwen3.5-122B-A10B
Resolve Rate
67.4
11
4d ago
Commit0-Lite
Claude Sonnet 4.5
Score
59.5
9
25d ago
PaperBench
Claude Sonnet 4.5
Score
66.8
9
25d ago
SWE-Bench Pro (public)
CCA
Resolve Rate (Pass@1)
59
9
1mo ago
SWE-rebench January 2026 (test)
Claude Opus 4.6
Resolved Rate
52.9
8
1mo ago
SWE-bench Lite
TOOLSELF
Accuracy
16.1
8
1mo ago
SWE-Bench Verified
Claude-Opus-4.5
SWE-Agent Score
78.2
7
1mo ago
SWE-Bench (val)
Claude 3.7 Sonnet
Acc
28.8
7
1mo ago
SWE-Bench Verified
ProRL Agent-14B (RL)
Reproduced Success Rate
23.6
6
29d ago
SWE-bench Verified
GEA
Worst-Case Success Rate
71
6
1mo ago
SWE-bench Verified
Qwen3.5 27b
Generic Score
72.4
4
17d ago
SWE-bench Verified 500 issues (Protocol A)
OpenHands CodeAct + GBT-SE
Success Rate (SR)
73.6
4
1mo ago
SWE-Bench AgentLess Repair
MiMo-V2-Flash Base
Resolved Percentage
30.8
4
1mo ago
SWE-Bench Verified
AgentSPEX
Score
77.1
3
3d ago
SWE-Bench Python subset Pro
SageAgent
Resolution Rate
59
3
1mo ago
SWE Verified MEDIUM reasoning
HarmonyAgent
Overall Score
53.3
2
17d ago
SWE Verified HIGH reasoning
gpt-oss-20b
Accuracy
60.7
2
17d ago
Showing 25 of 29 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs