Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool Invocation Refusal on SABEval 1.0 (D0)
Loading...
2.54
Tool Invocation Rate (TIR)
CAA
0.9868
11.4709
21.955
32.4391
Apr 13, 2026
Tool Invocation Rate (TIR)
Updated 5d ago
Evaluation Results
Method
Method
Links
Tool Invocation Rate (TIR)
CAA
Model=Watt-Tool
2026.04
2.54
CAA
Model=ToolACE-2.5
2026.04
3.41
CAA
Model=Qwen3-8B
2026.04
4.29
CAA
Model=Qwen3-14B
2026.04
5.56
CAA
Model=Qwen3-4B
2026.04
8.29
Prompt
Model=Watt-Tool
2026.04
8.59
Base
Model=Watt-Tool
2026.04
8.68
Prompt
Model=Qwen3-8B
2026.04
19.9
Prompt
Model=Qwen3-14B
2026.04
24.29
Prompt
Model=Qwen3-4B
2026.04
27.12
Base
Model=Qwen3-8B
2026.04
30.39
Base
Model=Qwen3-4B
2026.04
37.61
Base
Model=ToolACE-2.5
2026.04
37.95
Prompt
Model=ToolACE-2.5
2026.04
39.12
Base
Model=Qwen3-14B
2026.04
41.37
Feedback
Search any
task
Search any
task