Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agentic Task on tau2 Telecom
Loading...
92.3
Accuracy
HyperCLOVA X 32B Think
17.628
37.014
56.4
75.786
Jan 3, 2026
Jan 4, 2026
Jan 6, 2026
Jan 8, 2026
Jan 10, 2026
Jan 12, 2026
Jan 14, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
HyperCLOVA X 32B Think
User-simulator=GPT-5.1
2026.01
92.3
GPT-5.1 (Medium)
User-simulator=GPT-5.1
2026.01
81.2
GPT-5.1 (Medium)
User-simulator=GPT-4.1
2026.01
80.9
GLM-4.6
Thinking Mode=true, Pa...
2026.01
70.5
HyperCLOVA X 32B Think
User-simulator=GPT-4.1
2026.01
65.1
A.X K1
Thinking Mode=true, Pa...
2026.01
58.1
gpt-oss-120b
Number of Parameters=1...
2026.01
57.7
Solar Open
Number of Parameters=102B
2026.01
55.6
gpt-oss-120b
Number of Parameters=1...
2026.01
47.4
DeepSeek-V3.1
Thinking Mode=true, Pa...
2026.01
37.4
GLM-4.5-Air
Number of Parameters=110B
2026.01
28.1
Qwen3 235B-A22B
User-simulator=GPT-5.1
2026.01
24.2
Qwen3 235B-A22B
User-simulator=GPT-4.1
2026.01
20.5
Feedback
Search any
task
Search any
task