Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agent on τ2-Bench
Loading...
85.4
Accuracy
Gemini-3-Pro
56.28
63.84
71.4
78.96
Dec 30, 2025
Jan 13, 2026
Jan 27, 2026
Feb 11, 2026
Feb 25, 2026
Mar 11, 2026
Mar 26, 2026
Accuracy
Updated 22d ago
Evaluation Results
Method
Method
Links
Accuracy
Gemini-3-Pro
2026.03
85.4
Intern-S1-Pro
Number of Parameters=1...
2026.03
80.9
Kimi-K2.5
Number of Parameters=1...
2026.03
76.8
GPT-5.2
2026.03
76.6
LongCat-Flash Exp-Chat
Evaluation Mode=Chat
2025.12
69.5
GLM 4.6
Evaluation Mode=Chat
2025.12
69.1
LongCat-Flash Chat
Evaluation Mode=Chat
2025.12
68.8
DeepSeek V3.2
Evaluation Mode=Chat
2025.12
64
Qwen3-VL-235B-Thinking
Number of Parameters=2...
2026.03
57.4
Feedback
Search any
task
Search any
task