Share your thoughts, 1 month free Claude Pro on usSee more

Software Engineering Performance on SWE-bench Lite (Accuracy)

16.1Accuracy

TOOLSELF

Updated 4mo ago

Evaluation Results

Method	Links
TOOLSELF 2026.02		16.1
SWE-Search 2026.02		14.6
SWE Agent 2026.02		14.2
Vanilla Agent 2026.02		13.3
TOOLSELF 2026.02		12.4
SWE-Search 2026.02		11.6
SWE Agent 2026.02		10.9
Vanilla Agent 2026.02		10.1