Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Single-agent tool use on API-Bank reconstructed
Loading...
79.27
Correctness
Naive Agent
35.5692
46.9146
58.26
69.6054
Jan 17, 2026
Correctness
FP Rate
Execution Time (s)
Updated 4d ago
Evaluation Results
Method
Method
Links
Correctness
FP Rate
Execution Time (s)
Naive Agent
Tool Num=2
2026.01
79.27
-
10.43
SEAgent
Tool Num=1
2026.01
74.73
0
9
SEAgent
Tool Num=2
2026.01
71.34
0
11.16
Naive Agent
Tool Num=1
2026.01
70.33
-
10.8
SEAgent
Tool Num=>= 3
2026.01
67.91
5.13
12.79
Naive Agent
Tool Num=>= 3
2026.01
63.43
-
15.54
IsolateGPT
Tool Num=1
2026.01
53.95
5.26
21.98
IsolateGPT
Tool Num=2
2026.01
37.32
18.31
34.08
IsolateGPT
Tool Num=>= 3
2026.01
37.25
5.88
64.42
Feedback
Search any
task
Search any
task