Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool Calling on WHEN2TOOL Overall
Loading...
-1
Δ Accuracy
Necessary (N)
-29.808
-22.329
-14.85
-7.371
May 10, 2026
Δ Accuracy
Δ TC
Δ Acc / -Δ TC
Updated 22d ago
Evaluation Results
Method
Method
Links
Δ Accuracy
Δ TC
Δ Acc / -Δ TC
Necessary (N)
model averaging=six mo...
2026.05
-1
-0.06
-16.8
PROBE&PREFILL
model averaging=six mo...
2026.05
-1.7
-0.48
-3.6
Sparse (S)
model averaging=six mo...
2026.05
-8.4
-0.46
-18.4
Necessary + Reason-then-Act
model averaging=six mo...
2026.05
-15.8
-0.82
-19.2
Sparse + Reason-then-Act
model averaging=six mo...
2026.05
-19.7
-1
-19.6
No Tool + Reason-then-Act
model averaging=six mo...
2026.05
-24.2
-1.08
-22.4
No Tool (X)
model averaging=six mo...
2026.05
-28.7
-0.71
-40.5
Feedback
Search any
task
Search any
task