Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Tool-use on Task-Bench
Loading...
58.2
Task Completion Rate
Reasoningbank
39.272
44.186
49.1
54.014
Feb 3, 2026
Task Completion Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Task Completion Rate
Reasoningbank
LLM Backbone=Qwen3-32B
2026.02
58.2
No-Memory
LLM Backbone=Qwen3-32B
2026.02
56.8
Reasoningbank+prompt
LLM Backbone=Qwen3-32B
2026.02
56.4
DC
LLM Backbone=Qwen3-32B
2026.02
54
Reasoningbank+Guard
LLM Backbone=Qwen3-32B
2026.02
52.2
Memento
LLM Backbone=Qwen3-32B
2026.02
51.8
TAME
LLM Backbone=Qwen3-32B
2026.02
51.8
No-Memory
LLM Backbone=GPT-5.2
2026.02
48.6
TAME
LLM Backbone=GPT-5.2
2026.02
46.8
Reasoningbank+Guard
LLM Backbone=GPT-5.2
2026.02
45.4
Reasoningbank
LLM Backbone=GPT-5.2
2026.02
44.8
Reasoningbank+prompt
LLM Backbone=GPT-5.2
2026.02
44.2
DC
LLM Backbone=GPT-5.2
2026.02
43.8
Memento
LLM Backbone=GPT-5.2
2026.02
40
Feedback
Search any
task
Search any
task