Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Tool Use on Tau2-Telecom
Loading...
72.8
Avg@8
LongCat-Flash-Lite
1.6536
20.1243
38.595
57.0657
Jan 29, 2026
Feb 7, 2026
Feb 17, 2026
Feb 27, 2026
Mar 9, 2026
Mar 19, 2026
Mar 29, 2026
Avg@8
Updated 19d ago
Evaluation Results
Method
Method
Links
Avg@8
LongCat-Flash-Lite
Architecture=MoE + NE,...
2026.01
72.8
LongCat-Next
2026.03
62.06
Gemini 2.5 Flash-Lite
2026.01
21.93
Kimi-Linear-48B-A3B
Architecture=MoE, # To...
2026.01
15.68
Kimi-Linear-48B-A3B
2026.03
15.68
Qwen3-Next-80B-A3B-Instruct
Architecture=MoE, # To...
2026.01
13.2
Qwen3-Next-80B-A3B-Instruct
2026.03
13.2
Qwen3-Omni-A3B-Instruct
2026.03
4.39
Feedback
Search any
task
Search any
task