Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Tasks on Tau2 Airline
Loading...
64
Score
GPT-5.1 (Medium)
35.712
43.056
50.4
57.744
Jan 3, 2026
Jan 4, 2026
Jan 5, 2026
Jan 7, 2026
Jan 8, 2026
Jan 9, 2026
Jan 11, 2026
Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Score
GPT-5.1 (Medium)
User-simulator=GPT-4.1
2026.01
64
GPT-5.1 (Medium)
User-simulator=GPT-5.1
2026.01
64
GLM-4.5-Air
Number of Parameters=110B
2026.01
60.8
HyperCLOVA X 32B Think
User-simulator=GPT-4.1
2026.01
58
HyperCLOVA X 32B Think
User-simulator=GPT-5.1
2026.01
58
gpt-oss-120b
Number of Parameters=1...
2026.01
56
gpt-oss-120b
Number of Parameters=1...
2026.01
52.8
Solar Open
Number of Parameters=102B
2026.01
52.4
Qwen3 235B-A22B
User-simulator=GPT-5.1
2026.01
41.6
Qwen3 235B-A22B
User-simulator=GPT-4.1
2026.01
36.8
Feedback
Search any
task
Search any
task