Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Performance on TAU2-Bench
Loading...
85.4
Success Rate
Gemini 3-Pro
-1.336
21.182
43.7
66.218
Feb 4, 2026
Feb 17, 2026
Mar 3, 2026
Mar 17, 2026
Mar 31, 2026
Apr 14, 2026
Apr 28, 2026
Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Success Rate
Gemini 3-Pro
2026.02
85.4
DS V3.2-Thinking
2026.02
80.3
GPT-5 (High)
2026.02
80.1
ERNIE 5.0
2026.02
78.79
Lowest Centroid
Model=Minimax-M2.5, Mo...
2026.04
73
Pass@1
Model=Minimax-M2.5, Mo...
2026.04
66.1
Greedy Decoding
Model=Minimax-M2.5, Mo...
2026.04
65
Lowest Centroid
Model=Qwen3-Coder, Mod...
2026.04
65
Pass@1
Model=Qwen3-Coder, Mod...
2026.04
58.5
Gemini 2.5-Pro
2026.02
56.2
Greedy Decoding
Model=Qwen3-Coder, Mod...
2026.04
56
Bottom Window
Model=Qwen3-Coder, Mod...
2026.04
53
Tail Confidence
Model=Qwen3-Coder, Mod...
2026.04
50
Bottom Window
Model=Minimax-M2.5, Mo...
2026.04
48
Tail Confidence
Model=Minimax-M2.5, Mo...
2026.04
32
Greedy Decoding
Model=Qwen3-Next, Mode...
2026.04
27
Lowest Centroid
Model=Qwen3-Next, Mode...
2026.04
27
Pass@1
Model=Qwen3-Next, Mode...
2026.04
20.1
Bottom Window
Model=Qwen3-Next, Mode...
2026.04
9
Tail Confidence
Model=Qwen3-Next, Mode...
2026.04
2
Feedback
Search any
task
Search any
task