Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PHYRE: A New Benchmark for Physical Reasoning

About

Understanding and reasoning about physics is an important ability of intelligent agents. We develop the PHYRE benchmark for physical reasoning that contains a set of simple classical mechanics puzzles in a 2D physical environment. The benchmark is designed to encourage the development of learning algorithms that are sample-efficient and generalize well across puzzles. We test several modern learning algorithms on PHYRE and find that these algorithms fall short in solving the puzzles efficiently. We expect that PHYRE will encourage the development of novel sample-efficient agents that learn efficient but useful models of physics. For code and to play PHYRE for yourself, please visit https://player.phyre.ai.

Anton Bakhtin, Laurens van der Maaten, Justin Johnson, Laura Gustafson, Ross Girshick• 2019

Related benchmarks

TaskDatasetResultRank
Physical ReasoningPHYRE-1B within-template (test)
AUCCESS77.6
7
Physical ReasoningPHYRE-1B cross-template (test)
AUCCESS36.8
7
Physical ReasoningPHYRE Within-template 1.0
Success Rate (AUCCESS)77.6
6
Physical ReasoningPHYRE Cross-template 1.0
Success Rate34.5
6
Physical ReasoningPHYRE-2B within-template (test)
AUCCESS67.8
5
Physical ReasoningPHYRE-2B cross-template (test)
AUCCESS23.2
5
PlanningPHYRE within-task generalization B-tier
AUCCESS77.6
3
PlanningPHYRE cross-task generalization B-tier
AUCCESS36.8
3
Showing 8 of 8 rows

Other info

Code

Follow for update