Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Interactive Reasoning on ScienceWorld (Unseen)
Loading...
0.5862
Success Rate
AEC
0.212216
0.309308
0.4064
0.503492
Feb 3, 2026
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
AEC
Backbone=Llama3-8B
2026.02
0.5862
WKM
Backbone=Llama3-8B
2026.02
0.5475
ETO
Backbone=Llama3-8B
2026.02
0.5233
KnowAgent
Backbone=Llama3-8B
2026.02
0.4918
NAT
Backbone=Llama3-8B
2026.02
0.4876
Reflexion
Backbone=Llama3-8B
2026.02
0.2541
ReAct
Backbone=Llama3-8B
2026.02
0.2266
Feedback
Search any
task
Search any
task