Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Data Science Agent tasks on xBench-DS
Loading...
0.75
Pass@1
TodoEvolve + Smolagents
0.4588
0.5344
0.61
0.6856
Feb 8, 2026
Pass@1
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1
TodoEvolve + Smolagents
Model Family=GPT-5-Min...
2026.02
0.75
TodoEvolve + Smolagents
Model Family=DeepSeek...
2026.02
0.74
TodoEvolve + Smolagents
Model Family=Kimi K2,...
2026.02
0.71
Flash-Searcher
Model Family=GPT-5-min...
2026.02
0.69
Agent KB
Model Family=GPT-4.1,...
2026.02
0.68
Flash-Searcher
Model Family=DeepSeek...
2026.02
0.68
Flash-Searcher
Model Family=Kimi K2,...
2026.02
0.66
Agent KB
Model Family=GPT-4.1,...
2026.02
0.58
Cognitive Kernel-Pro
Model Family=Claude-3....
2026.02
0.56
OWL Workforce
Model Family=GPT-4o+o3...
2026.02
0.55
Smolagents
Model Family=GPT-5-min...
2026.02
0.51
Agent KB
Model Family=GPT-4.1,...
2026.02
0.48
OAgents
Model Family=Claude-3....
2026.02
0.47
Feedback
Search any
task
Search any
task