Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Task (Agentic Coding) on tau2-Bench Telecom
Loading...
98.2
Score
GLM-5
84.264
87.882
91.5
95.118
Mar 29, 2026
Score
Updated 19d ago
Evaluation Results
Method
Method
Links
Score
GLM-5
Source=artificialanaly...
2026.03
98.2
Gemini 3.1 Pro
Source=artificialanaly...
2026.03
95.6
KAT-Coder-V2
Evaluation Environment...
2026.03
93.9
Claude Opus 4.6
Source=artificialanaly...
2026.03
92.1
GPT-5.4
Source=artificialanaly...
2026.03
91.5
MiniMax M2.7
Source=artificialanaly...
2026.03
84.8
Feedback
Search any
task
Search any
task