Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Tool Use on τ²-Bench Retail
Loading...
84.7
Accuracy
Qwen3.5-27B
66.292
71.071
75.85
80.629
Apr 9, 2026
Accuracy
Updated 6d ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen3.5-27B
Architecture=Dense, #...
2026.04
84.7
K-EXAONE-236B-A23B
Architecture=MoE, # To...
2026.04
78.6
GPT-5 mini
Reasoning Mode=REASONI...
2026.04
78.3
EXAONE 4.5 33B
Architecture=Dense, #...
2026.04
77.9
Qwen3-VL-235B-A22B
Architecture=MoE, # To...
2026.04
67
Feedback
Search any
task
Search any
task