Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Scientific Reasoning on ScienceWorld
Loading...
74.5
Seen Accuracy
Co-Evolving Agents
64.1
66.8
69.5
72.2
Nov 27, 2025
Seen Accuracy
Unseen Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Seen Accuracy
Unseen Accuracy
Co-Evolving Agents
Model=Llama-2-13B-chat
2025.11
74.5
65.5
ETO
Model=Llama-2-13B-chat
2025.11
72.6
65.3
SFT
Model=Llama-2-13B-chat
2025.11
64.5
56.1
Feedback
Search any
task
Search any
task