Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Task Performance on Agent Capabilities
Loading...
90.4
Success Rate
Gemini-3 Pro
15.104
34.652
54.2
73.748
Mar 10, 2026
Success Rate
Updated 2mo ago
Evaluation Results
Method
Method
Links
Success Rate
Gemini-3 Pro
Model variant=low
2026.03
90.4
Gemini-3 Pro
Model variant=high
2026.03
90.1
gpt-5.2
Model variant=high
2026.03
85.7
gpt-5-mini
Price Range=cost-effec...
2026.03
85.1
gpt-5.2
Model variant=instant
2026.03
81.1
kimi-k2
Model variant=thinking
2026.03
77.3
gpt-4.1
2026.03
73.3
sabia-4
2026.03
72.2
Qwen3
Model variant=235b
2026.03
67.8
gpt-oss-120b
Price Range=cost-effec...
2026.03
60.9
gpt-4.1-mini
Price Range=cost-effec...
2026.03
59.4
sabiazinho-4
Price Range=cost-effec...
2026.03
55.2
sabia-3.1
2026.03
43.1
deepseek
Model variant=v3.2
2026.03
40.5
gemini-2.5-flash-lite
Price Range=cost-effec...
2026.03
18
Feedback
Search any
task
Search any
task