Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ScienceWorld

Benchmarks

Task NameDataset NameSOTA ResultTrend
Interactive ReasoningScienceWorld (Seen)
Success Rate63.91
31
Interactive Science ReasoningScienceWorld (test)
Score84.6
27
Interactive Decision-makingScienceWorld Unseen (test)
Success Rate58.94
24
World ModelingScienceWorld
Matter Score52.8
20
Agentic task completionScienceWorld
L0 Score75
18
scientific reasoningScienceWorld Unseen
Average Reward58.5
10
scientific reasoningScienceWorld Seen
Average Reward71.6
10
Textual Environment InteractionScienceWorld
Base Score96.06
8
Interactive Science SimulationScienceWorld v1.0 (test)
Task 1-1 (L) Score97.04
8
Interactive ReasoningScienceWorld (Unseen)
Success Rate0.5862
7
Scientific Reasoning in Text-based EnvironmentsScienceWorld (test)
Task 1-1 Score44.8
7
Science simulation and text-based scientific reasoningScienceWorld variations (test)
Changes of State: Boiling Success4
7
Sequential Decision MakingScienceWorld
Pass@1 Success Rate70.4
4
scientific reasoningScienceWorld
Seen Accuracy74.5
3
Showing 14 of 14 rows