Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agentic Tasks on Tau2 Airline
Loading...
64
Score
GPT-5.1 (Medium)
35.712
43.056
50.4
57.744
Jan 3, 2026
Jan 4, 2026
Jan 5, 2026
Jan 7, 2026
Jan 8, 2026
Jan 9, 2026
Jan 11, 2026
Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Score
GPT-5.1 (Medium)
User-simulator=GPT-4.1
2026.01
64
GPT-5.1 (Medium)
User-simulator=GPT-5.1
2026.01
64
GLM-4.5-Air
Number of Parameters=110B
2026.01
60.8
HyperCLOVA X 32B Think
User-simulator=GPT-4.1
2026.01
58
HyperCLOVA X 32B Think
User-simulator=GPT-5.1
2026.01
58
gpt-oss-120b
Number of Parameters=1...
2026.01
56
gpt-oss-120b
Number of Parameters=1...
2026.01
52.8
Solar Open
Number of Parameters=102B
2026.01
52.4
Qwen3 235B-A22B
User-simulator=GPT-5.1
2026.01
41.6
Qwen3 235B-A22B
User-simulator=GPT-4.1
2026.01
36.8
Feedback
Search any
task
Search any
task