Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agentic Instruction Following on TaskCraft
Loading...
79.33
Performance Score
MemEvolve + Flash-Searcher
57.1468
62.9059
68.665
74.4241
Dec 21, 2025
Performance Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Performance Score
MemEvolve + Flash-Searcher
Model Family=GPT-5-min...
2025.12
79.33
MemEvolve + SmolAgent
Model Family=GPT-5-min...
2025.12
77
Agent KB
Model Family=GPT-4.1,...
2025.12
75.33
MemEvolve + SmolAgent
Model Family=GPT-5-min...
2025.12
75
MemEvolve + Flash-Searcher
Model Family=GPT-5-min...
2025.12
75
Agent KB
Model Family=GPT-4.1,...
2025.12
72.67
MemEvolve + Flash-Searcher
Model Family=DeepSeek...
2025.12
72.67
MemEvolve + Flash-Searcher
Model Family=GPT-5-min...
2025.12
72
Flash-Searcher
Model Family=GPT-5-min...
2025.12
69.67
Flash-Searcher
Model Family=DeepSeek...
2025.12
69.33
MemEvolve + Flash-Searcher
Model Family=Kimi K2,...
2025.12
68
MemEvolve + SmolAgent
Model Family=GPT-5-min...
2025.12
67.67
Cognitive Kernel-Pro
Model Family=Claude-3....
2025.12
66
Smolagents
Model Family=GPT-5-mini
2025.12
64
Agent KB
Model Family=GPT-4.1,...
2025.12
61.67
OWL Workforce
Model Family=GPT-4o+o3...
2025.12
58.33
Flash-Searcher
Model Family=Kimi K2,...
2025.12
58
Feedback
Search any
task
Search any
task