Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Sciworld

Benchmarks

Task NameDataset NameSOTA ResultTrend
Scientific ReasoningSciWorld
Accuracy95.9
164
Science SimulationSciWorld
Accuracy94.6
41
Embodied AgenticSciWorld
Accuracy88.9
21
Next-state predictionSciWorld (SW)
EM Accuracy98.64
16
Task successSciWorld
Real68.21
14
Scientific ReasoningSciWorld
Success Rate (SR)59.48
14
Science simulationSciworld
Progress Rate82.6
12
Interactive AgentSciWorld
Pass@129.45
10
Science Experiment SimulationSciWorld
Average Reward69.1
3
Showing 9 of 9 rows