Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Tool Use on τ²-Bench Airline
Loading...
67.5
Accuracy
Qwen3.5-27B
56.06
59.03
62
64.97
Apr 9, 2026
Accuracy
Updated 6d ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen3.5-27B
Architecture=Dense, #...
2026.04
67.5
Qwen3-VL-235B-A22B
Architecture=MoE, # To...
2026.04
62
K-EXAONE-236B-A23B
Architecture=MoE, # To...
2026.04
60.4
GPT-5 mini
Reasoning Mode=REASONI...
2026.04
60
EXAONE 4.5 33B
Architecture=Dense, #...
2026.04
56.5
Feedback
Search any
task
Search any
task