Share your thoughts, 1 month free Claude Pro on usSee more

Agentic Performance on TAU2-Bench

85.4Success Rate

Gemini 3-Pro

Updated 2mo ago

Evaluation Results

Method	Links
Gemini 3-Pro 2026.02		85.4
DS V3.2-Thinking 2026.02		80.3
GPT-5 (High) 2026.02		80.1
ERNIE 5.0 2026.02		78.79
Lowest Centroid 2026.04		73
Pass@1 2026.04		66.1
Greedy Decoding 2026.04		65
Lowest Centroid 2026.04		65
Pass@1 2026.04		58.5
Gemini 2.5-Pro 2026.02		56.2
Greedy Decoding 2026.04		56
Bottom Window 2026.04		53
Tail Confidence 2026.04		50
Bottom Window 2026.04		48
Tail Confidence 2026.04		32
Greedy Decoding 2026.04		27
Lowest Centroid 2026.04		27
Pass@1 2026.04		20.1
Bottom Window 2026.04		9
Tail Confidence 2026.04		2