Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Sciworld

Benchmarks

Task NameDataset NameSOTA ResultTrend
Scientific ReasoningSciWorld
Accuracy95.9
164
Science SimulationSciWorld
Accuracy94.6
41
Embodied AgenticSciWorld
Accuracy88.9
21
One-step next-observation predictionSciWorld (test)
Token F196
16
Next-state predictionSciWorld (SW)
EM Accuracy98.64
16
Task successSciWorld
Real68.21
14
Scientific ReasoningSciWorld
Success Rate (SR)59.48
14
Text-based embodied taskSciworld
Success Rate77.77
13
Science simulationSciworld
Progress Rate82.6
12
Scientific SimulationSciWorld
Measure Score55.7
10
Lifelong Agent InteractionSciWorld
Success Rate (SR)73.5
10
Interactive AgentSciWorld
Pass@129.45
10
Autonomous ExplorationSciWorld
Steps7.4
7
Science Experiment SimulationSciWorld
Average Reward69.1
3
Showing 14 of 14 rows