Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Test Writing on SWE-Atlas
Loading...
37.78
Score
LARGERFixed
27.38
30.08
32.78
35.48
May 8, 2026
Score
Updated 15d ago
Evaluation Results
Method
Method
Links
Score
LARGERFixed
Backbone LLM=GPT-5.2
2026.05
37.78
Claude Code*
Backbone LLM=Claude Op...
2026.05
36.67
Codex
Backbone LLM=GPT-5.2
2026.05
32.22
mini-swe-agent
Backbone LLM=GPT-5.2
2026.05
27.78
Feedback
Search any
task
Search any
task