Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Tool Use Evaluation on GTM
Loading...
89.4
Average Score
GTM-1.5B
7.656
28.878
50.1
71.322
Dec 4, 2025
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Score
GTM-1.5B
Model Size=1.5B
2025.12
89.4
Qwen2.5-14B-Instruct
Model Size=14B
2025.12
85.8
Qwen2.5-7B-Instruct
Model Size=7B
2025.12
83
InternLM2.5-20B
Model Size=20B
2025.12
69.4
Llama-3.2-3B-Instruct
Model Size=3B
2025.12
68.3
Qwen2.5-3B-Instruct
Model Size=3B
2025.12
65.3
Qwen2.5-1.5B-Instruct
Model Size=1.5B
2025.12
61.2
Qwen2.5-0.5B-Instruct
Model Size=0.5B
2025.12
45.1
Llama-3.2-1B-Instruct
Model Size=1B
2025.12
39.8
InternLM2.5-7B
Model Size=7B
2025.12
39.2
InternLM2.5-1.8B
Model Size=1.8B
2025.12
10.8
Feedback
Search any
task
Search any
task