Share your thoughts, 1 month free Claude Pro on usSee more

Agentic Task Performance on Agent Capabilities

90.4Success Rate

Gemini-3 Pro

Updated 4mo ago

Evaluation Results

Method	Links
Gemini-3 Pro 2026.03		90.4
Gemini-3 Pro 2026.03		90.1
gpt-5.2 2026.03		85.7
gpt-5-mini 2026.03		85.1
gpt-5.2 2026.03		81.1
kimi-k2 2026.03		77.3
gpt-4.1 2026.03		73.3
sabia-4 2026.03		72.2
Qwen3 2026.03		67.8
gpt-oss-120b 2026.03		60.9
gpt-4.1-mini 2026.03		59.4
sabiazinho-4 2026.03		55.2
sabia-3.1 2026.03		43.1
deepseek 2026.03		40.5
gemini-2.5-flash-lite 2026.03		18