Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames

About

We present Decentralized Distributed Proximal Policy Optimization (DD-PPO), a method for distributed reinforcement learning in resource-intensive simulated environments. DD-PPO is distributed (uses multiple machines), decentralized (lacks a centralized server), and synchronous (no computation is ever stale), making it conceptually simple and easy to implement. In our experiments on training virtual robots to navigate in Habitat-Sim, DD-PPO exhibits near-linear scaling -- achieving a speedup of 107x on 128 GPUs over a serial implementation. We leverage this scaling to train an agent for 2.5 Billion steps of experience (the equivalent of 80 years of human experience) -- over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs. This massive-scale training not only sets the state of art on Habitat Autonomous Navigation Challenge 2019, but essentially solves the task --near-perfect autonomous navigation in an unseen environment without access to a map, directly from an RGB-D camera and a GPS+Compass sensor. Fortuitously, error vs computation exhibits a power-law-like distribution; thus, 90% of peak performance is obtained relatively early (at 100 million steps) and relatively cheaply (under 1 day with 8 GPUs). Finally, we show that the scene understanding and navigation policies learned can be transferred to other navigation tasks -- the analog of ImageNet pre-training + task-specific fine-tuning for embodied AI. Our model outperforms ImageNet pre-trained CNNs on these transfer tasks and can serve as a universal resource (all models and code are publicly available).

Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, Dhruv Batra• 2019

Related benchmarks

TaskDatasetResultRank
ObjectGoal NavigationMP3D (val)
Success Rate8
68
Object Goal NavigationHM3D-OVON Seen (val)
SR39.2
44
Object Goal NavigationHM3D v1 (val)
Success Rate (SR)27.9
34
ObjectNav (Label goal)Gibson tiny (test)
Success Rate13.9
20
ObjectNavGibson (val)
Success Rate15
18
System Throughput MeasurementEmbodied Rearrangement open-fridge (train)
Mean SPS1.07e+3
16
Object NavigationCoIN-Bench Seen Synonyms (val)
SPL11.7
13
Image-Goal NavigationGibson Curved trajectories (unseen)
Succ (Easy)22.2
12
Object NavigationOVON unseen (val)
SR18.6
12
ObjectGoal NavigationMP3D (test-std)
Success Rate0.062
11
Showing 10 of 35 rows

Other info

Code

Follow for update