Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WorldModelBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Physical Reasoning EvaluationWorldModelBench
General Score41.8
9
Video GenerationWorldModelBench
Instruction Score2.18
7
World modelingWorldModelBench Robot in office scenario (test)
Total Score6.9
2
World modelingWorldModelBench Outdoor vehicle scenario (test)
Total Score5.8
2
World modelingWorldModelBench Vehicle FSI scenario (test)
Total Score6.8
2
World modelingWorldModelBench Robot in office
Instruction Following Score2
2
World modelingWorldModelBench Outdoor vehicle
INSTR Score1
2
World modelingWorldModelBench Vehicle FSI
Instruction Following Score2.9
2
World ModelingWorldModelBench Aggregated across three scenarios
Instruction Score5.9
2
Showing 9 of 9 rows