Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Capability on tau^2-Bench Telecom
Loading...
89
Pass@1
Mi:dm K 2.5 Pro (March ‘26)
19.32
37.41
55.5
73.59
Mar 19, 2026
Pass@1
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
Mi:dm K 2.5 Pro (March ‘26)
Reasoning=On
2026.03
89
HyperCLOVAX-SEED-Think-32B
Reasoning=On
2026.03
87
K-EXAONE-236B-A23B
Reasoning=On
2026.03
74
K-EXAONE-236B-A23B
Reasoning=Off
2026.03
59
Solar-Open-100B
Reasoning=On
2026.03
48
Qwen-3-30B-A3B
Reasoning=On
2026.03
28
Qwen-3-30B-A3B
Reasoning=Off
2026.03
22
Feedback
Search any
task
Search any
task