Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Reasoning on TauBench V2
Loading...
66
Airline Score
Qwen3.5-122B-A10B
48.528
53.064
57.6
62.136
Apr 14, 2026
Airline Score
Retail Score
Telecom Score
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Airline Score
Retail Score
Telecom Score
Average Score
Qwen3.5-122B-A10B
2026.04
66
62.6
95
74.53
Nemotron 3 Super
2026.04
56.25
62.83
64.36
61.15
GPT-OSS-120B
2026.04
49.2
67.8
66
61
Feedback
Search any
task
Search any
task