ProcTHOR: Large-Scale Embodied AI Using Procedural Generation

About

Massive datasets and high-capacity models have driven many recent advancements in computer vision and natural language understanding. This work presents a platform to enable similar success stories in Embodied AI. We propose ProcTHOR, a framework for procedural generation of Embodied AI environments. ProcTHOR enables us to sample arbitrarily large datasets of diverse, interactive, customizable, and performant virtual environments to train and evaluate embodied agents across navigation, interaction, and manipulation tasks. We demonstrate the power and potential of ProcTHOR via a sample of 10,000 generated houses and a simple neural model. Models trained using only RGB images on ProcTHOR, with no explicit mapping and no human task supervision produce state-of-the-art results across 6 embodied AI benchmarks for navigation, rearrangement, and arm manipulation, including the presently running Habitat 2022, AI2-THOR Rearrangement 2022, and RoboTHOR challenges. We also demonstrate strong 0-shot results on these benchmarks, via pre-training on ProcTHOR with no fine-tuning on the downstream benchmark, often beating previous state-of-the-art systems that access the downstream training data.

Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Jordi Salvador, Kiana Ehsani, Winson Han, Eric Kolve, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi• 2022

Related benchmarks

Task	Dataset	Result
Object Goal Navigation	HM3D v1 (val)	Success Rate (SR)54.4	65
Object Navigation	HM3D v1 (val)	SR54.4	32
Object Navigation	HM3D ObjNav	Success Rate (SR)20.2	22
Object Navigation	HM3D v1	SR54.4	18
Simulator Throughput	EAI (Embodied AI) Simulators	Train SPS300	13
Embodied AI Simulation	Embodied AI Simulators Comparison	Number of Assets1.63e+3	10
Object Goal Navigation	HM3D Habitat 2022 ObjectNav challenge (val)	Success Rate (SR)54.4	9
Object Goal Navigation	RoboTHOR ObjectNav challenge 2020/2021 (val)	Success Rate (SR)65.2	9
Object Navigation	HM3D Standard (test)	Success Rate54	7
Perceptual Scene Synthesis Evaluation	Amazon Mechanical Turk (AMT) Perceptual Study (test)	Mean Error Frequency0.252	5

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord