Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Tool Use on τ²-Bench Telecom
Loading...
100
Accuracy
GPT 5.4
-4
23
50
77
Apr 9, 2026
Apr 15, 2026
Apr 21, 2026
Apr 27, 2026
May 3, 2026
May 9, 2026
May 16, 2026
Accuracy
Updated 15d ago
Evaluation Results
Method
Method
Links
Accuracy
GPT 5.4
Deployment=Cloud baseline
2026.05
100
Qwen3.5-27B
Architecture=Dense, #...
2026.04
99.3
Claude Opus 4.6
Deployment=Cloud baseline
2026.05
89.4
Qwen3.5-122B
Deployment=Local model
2026.05
86.2
Best Local
Config=Oracle local ro...
2026.05
86.2
Gemini 3.1 Pro
Deployment=Cloud baseline
2026.05
85
Qwen3.5-27B
Deployment=Local model
2026.05
84.9
Qwen3.5-35B
Deployment=Local model
2026.05
83.1
Gemma4-26B
Deployment=Local model
2026.05
78.5
Qwen3.5-9B
Deployment=Local model
2026.05
75.3
GPT-5 mini
Reasoning Mode=REASONI...
2026.04
74.1
K-EXAONE-236B-A23B
Architecture=MoE, # To...
2026.04
73.5
EXAONE 4.5 33B
Architecture=Dense, #...
2026.04
73
Nemotron-Super-120B
Deployment=Local model
2026.05
68.3
Gemma4-E4B
Deployment=Local model
2026.05
61.3
Qwen3-VL-235B-A22B
Architecture=MoE, # To...
2026.04
44.7
Granite 3.3 8B
Deployment=Local model
2026.05
5.3
Granite 4.0 H-Small
Deployment=Local model
2026.05
0
Feedback
Search any
task
Search any
task