Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Tool-augmented reasoning on API-Bank
Loading...
79.1
Success Rate
GenEnv
39.684
49.917
60.15
70.383
Dec 22, 2025
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
GenEnv
Scale group=7B Models,...
2025.12
79.1
Llama 3.1 405B
Scale group=Large Scal...
2025.12
74.4
Qwen 3 14B
Scale group=Large Scal...
2025.12
66.7
ReSearch
Scale group=7B Models,...
2025.12
65.3
Llama 3.1 70B
Scale group=Large Scal...
2025.12
64.3
Qwen 3 32B
Scale group=Large Scal...
2025.12
63.8
SearchR1
Scale group=7B Models,...
2025.12
63.3
Qwen 2.5 7B
Scale group=7B Models,...
2025.12
61.6
Qwen 2.5 72B
Scale group=Large Scal...
2025.12
54.9
ToRL
Scale group=7B Models,...
2025.12
54.1
GPT-OSS 120B
Scale group=Large Scal...
2025.12
53.6
GPT-OSS 20B
Scale group=Large Scal...
2025.12
41.2
Feedback
Search any
task
Search any
task