Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Sequential Decision Making on ScienceWorld
Loading...
70.4
Pass@1 Success Rate
AutoRefine
61.456
63.778
66.1
68.422
Jan 30, 2026
Pass@1 Success Rate
Required Steps
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1 Success Rate
Required Steps
AutoRefine
Backbone=GPT-4-turbo
2026.01
70.4
16.5
ReAct + Reflexion
Backbone=GPT-4-turbo
2026.01
69.2
40.2
Reflexion
Backbone=GPT-4-turbo
2026.01
67.4
42
ReAct
Backbone=GPT-4-turbo
2026.01
61.8
26.2
Feedback
Search any
task
Search any
task