Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agentic Tool Use on Tau2 Retail
Loading...
73.1
Avg@8
LongCat-Flash-Lite
16.6904
31.3352
45.98
60.6248
Jan 29, 2026
Avg@8
Updated 4d ago
Evaluation Results
Method
Method
Links
Avg@8
LongCat-Flash-Lite
Architecture=MoE + NE,...
2026.01
73.1
Qwen3-Next-80B-A3B-Instruct
Architecture=MoE, # To...
2026.01
57.3
Gemini 2.5 Flash-Lite
2026.01
37.5
Kimi-Linear-48B-A3B
Architecture=MoE, # To...
2026.01
18.86
Feedback
Search any
task
Search any
task