Our new X account is live! Follow @wizwand_team for updates
Search any
task
Feedback
Search any
task
SOTA Agent Task benchmarks and papers with code | Wizwand
Our new X account is live! Follow @wizwand_team for updates
Home
/
Tasks
Agent Task
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
AlfWorld
DeepSeek-R1
Success Rate
83.6
21
4d ago
WebShop
Gemini 2.5 Pro
Success Rate
43.2
17
4d ago
Sudoku
Gemini 2.5 Pro
Success Rate (SR)
99
17
4d ago
FrozenLake
Gemini 2.5 Pro
Success Rate
100
17
4d ago
BlocksWorld
gpt-oss-120b
Success Rate
100
17
4d ago
Showing 5 of 5 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task
FAQs