Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool Use Accuracy on tau2-Bench
Loading...
30.7
Accuracy
Qwen2.5-32B-CodeGym
1.476
9.063
16.65
24.237
Sep 22, 2025
Oct 8, 2025
Oct 25, 2025
Nov 11, 2025
Nov 27, 2025
Dec 14, 2025
Dec 31, 2025
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen2.5-32B-CodeGym
CoT Pattern=Short-CoT,...
2025.09
30.7
QwQ-32B-CodeGym
CoT Pattern=Long-CoT,...
2025.09
30.7
QwQ-32B
CoT Pattern=Long-CoT,...
2025.09
26.1
Qwen2.5-72B-CodeGym
CoT Pattern=Short-CoT,...
2025.09
25.8
Qwen2.5-32B-Instruct
CoT Pattern=Short-CoT,...
2025.09
24.7
Qwen2.5-72B-Instruct
CoT Pattern=Short-CoT,...
2025.09
22.6
Qwen2.5-14B-Instruct
CoT Pattern=Short-CoT,...
2025.09
20.9
Qwen2.5-14B-CodeGym
CoT Pattern=Short-CoT,...
2025.09
19.9
Qwen2.5-7B-CodeGym
CoT Pattern=Short-CoT,...
2025.09
15.5
Youtu-LLM 2B
Non-thinking Mode=true...
2025.12
15
Qwen2.5-7B-Instruct
CoT Pattern=Short-CoT,...
2025.09
14.9
Qwen3 4B
Non-thinking Mode=true...
2025.12
10.9
SmolLM3 3B
Non-thinking Mode=true...
2025.12
9.7
Qwen3 1.7B
Non-thinking Mode=true...
2025.12
2.6
Feedback
Search any
task
Search any
task