Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Design Optimization on 10 optimization shared-core domains
Loading...
67.5
Success Rate
Opus 4.6
15.812
29.231
42.65
56.069
Mar 13, 2026
Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Success Rate
Opus 4.6
Turns=20 (Opt.)
2026.03
67.5
Sonnet 4.6
Turns=20 (Opt.)
2026.03
63
Sonnet 4.5
Turns=20 (Opt.)
2026.03
61.5
GPT-5.2
Turns=20 (Opt.)
2026.03
58.5
Gemini 3.1 Pro
Turns=20 (Opt.)
2026.03
56
Gemini 2.0 Flash
Turns=20 (Opt.)
2026.03
48
GPT-4o
Turns=20 (Opt.)
2026.03
41.5
Opus 4.6
Turns=1 (Opt.)
2026.03
34.5
Gemini 3.1 Pro
Turns=1 (Opt.)
2026.03
31.3
Sonnet 4.6
Turns=1 (Opt.)
2026.03
31.2
Sonnet 4.5
Turns=1 (Opt.)
2026.03
30.5
GPT-5.2
Turns=1 (Opt.)
2026.03
29
Gemini 2.0 Flash
Turns=1 (Opt.)
2026.03
22.2
GPT-4o
Turns=1 (Opt.)
2026.03
17.8
Feedback
Search any
task
Search any
task