Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Data Science Task Automation on xBench-DS
Loading...
78
Score
MemEvolve + Flash-Searcher
45.76
54.13
62.5
70.87
Dec 21, 2025
Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Score
MemEvolve + Flash-Searcher
Model Family=GPT-5-min...
2025.12
78
MemEvolve + Flash-Searcher
Model Family=GPT-5-min...
2025.12
77
MemEvolve + Flash-Searcher
Model Family=GPT-5-min...
2025.12
74
MemEvolve + Flash-Searcher
Model Family=DeepSeek...
2025.12
70
Flash-Searcher
Model Family=GPT-5-min...
2025.12
69
Agent KB
Model Family=GPT-4.1,...
2025.12
68
Flash-Searcher
Model Family=DeepSeek...
2025.12
68
MemEvolve + SmolAgent
Model Family=GPT-5-min...
2025.12
68
MemEvolve + Flash-Searcher
Model Family=Kimi K2,...
2025.12
68
Flash-Searcher
Model Family=Kimi K2,...
2025.12
66
MemEvolve + SmolAgent
Model Family=GPT-5-min...
2025.12
63
Agent KB
Model Family=GPT-4.1,...
2025.12
58
MemEvolve + SmolAgent
Model Family=GPT-5-min...
2025.12
57
Cognitive Kernel-Pro
Model Family=Claude-3....
2025.12
56
OWL Workforce
Model Family=GPT-4o+o3...
2025.12
55
Smolagents
Model Family=GPT-5-mini
2025.12
51
Agent KB
Model Family=GPT-4.1,...
2025.12
48
OAgents
Model Family=Claude-3....
2025.12
47
Feedback
Search any
task
Search any
task