Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool Calling on WHEN2TOOL Easy
Loading...
-0.3
ΔAcc
Necessary (N)
-18.812
-14.006
-9.2
-4.394
May 10, 2026
ΔAcc
ΔTC
ΔAcc/-ΔTC
Updated 22d ago
Evaluation Results
Method
Method
Links
ΔAcc
ΔTC
ΔAcc/-ΔTC
Necessary (N)
model averaging=six mo...
2026.05
-0.3
-0.09
-3.5
PROBE&PREFILL
model averaging=six mo...
2026.05
-1.1
-0.66
-1.6
Sparse (S)
model averaging=six mo...
2026.05
-6.3
-0.55
-11.3
Necessary + Reason-then-Act
model averaging=six mo...
2026.05
-8.1
-0.95
-8.5
Sparse + Reason-then-Act
model averaging=six mo...
2026.05
-9.9
-1.13
-8.8
No Tool + Reason-then-Act
model averaging=six mo...
2026.05
-12.4
-1.18
-10.5
No Tool (X)
model averaging=six mo...
2026.05
-18.1
-0.72
-25.2
Feedback
Search any
task
Search any
task