Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool Use on τ-Bench (TauB) V2 (accuracy)
Loading...
91.6
Accuracy
Qwen3.5-122B
7.256
29.153
51.05
72.947
May 16, 2026
Accuracy
Updated 15d ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen3.5-122B
Deployment=Local model
2026.05
91.6
Best Local
Config=Oracle local ro...
2026.05
91.6
Gemma4-26B
Deployment=Local model
2026.05
91.3
Gemini 3.1 Pro
Deployment=Cloud baseline
2026.05
90.8
Qwen3.5-35B
Deployment=Local model
2026.05
90.2
Claude Opus 4.6
Deployment=Cloud baseline
2026.05
89.5
GPT 5.4
Deployment=Cloud baseline
2026.05
89.2
Qwen3.5-27B
Deployment=Local model
2026.05
88.4
Qwen3.5-9B
Deployment=Local model
2026.05
77.1
Gemma4-E4B
Deployment=Local model
2026.05
56.1
Nemotron-Super-120B
Deployment=Local model
2026.05
36.8
Granite 4.0 H-Small
Deployment=Local model
2026.05
17.5
Granite 3.3 8B
Deployment=Local model
2026.05
10.5
Feedback
Search any
task
Search any
task