Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool Invocation Refusal on SABEval (D3)
Loading...
85.56
Tool Invocation Rate (TIR)
Base
3.7328
24.9764
46.22
67.4636
Apr 13, 2026
Tool Invocation Rate (TIR)
Updated 5d ago
Evaluation Results
Method
Method
Links
Tool Invocation Rate (TIR)
Base
Model=Qwen3-4B
2026.04
85.56
Base
Model=Qwen3-14B
2026.04
83.9
Base
Model=Qwen3-8B
2026.04
75.56
Prompt
Model=Qwen3-4B
2026.04
69.71
Prompt
Model=ToolACE-2.5
2026.04
67.71
Base
Model=ToolACE-2.5
2026.04
67.41
Prompt
Model=Qwen3-8B
2026.04
57.66
Prompt
Model=Qwen3-14B
2026.04
56.78
Base
Model=Watt-Tool
2026.04
31.51
Prompt
Model=Watt-Tool
2026.04
29.61
CAA
Model=Qwen3-4B
2026.04
27.76
CAA
Model=Qwen3-14B
2026.04
15.51
CAA
Model=Qwen3-8B
2026.04
8.98
CAA
Model=Watt-Tool
2026.04
8.1
CAA
Model=ToolACE-2.5
2026.04
6.88
Feedback
Search any
task
Search any
task