Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool-calling on ACEBench Extended Setting
Loading...
65.17
Overall Score
GT_Funs
20.2732
31.9291
43.585
55.2409
Mar 12, 2026
Overall Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Overall Score
GT_Funs
Model Scale=Qwen2.5-7B...
2026.03
65.17
ToolGT (Prompting)
Model Scale=Qwen2.5-7B...
2026.03
62.42
Tool-DC (TF)
Model Scale=Qwen2.5-7B...
2026.03
58.83
All_Funs
Model Scale=Qwen2.5-7B...
2026.03
58.58
HiTEC-ICL
Model Scale=Qwen2.5-7B...
2026.03
54.67
GT_Funs
Model Scale=Qwen2.5-3B...
2026.03
54.42
Tool-DC (TF)
Model Scale=Qwen2.5-3B...
2026.03
48.17
GT_Funs
Model Scale=Qwen2.5-1....
2026.03
47.92
ToolGT (Prompting)
Model Scale=Qwen2.5-3B...
2026.03
46.58
Top-K
Model Scale=Qwen2.5-7B...
2026.03
46.31
Tool-DC (TF)
Model Scale=Qwen2.5-1....
2026.03
46.08
Top-K
Model Scale=Qwen2.5-1....
2026.03
38.58
Top-K
Model Scale=Qwen2.5-3B...
2026.03
38.02
All_Funs
Model Scale=Qwen2.5-3B...
2026.03
36.5
ToolGT (Prompting)
Model Scale=Qwen2.5-1....
2026.03
35.33
HiTEC-ICL
Model Scale=Qwen2.5-3B...
2026.03
34.92
HiTEC-ICL
Model Scale=Qwen2.5-1....
2026.03
25.42
All_Funs
Model Scale=Qwen2.5-1....
2026.03
22
Feedback
Search any
task
Search any
task