Learning to Drive by Watching YouTube Videos: Action-Conditioned Contrastive Policy Pretraining

About

Deep visuomotor policy learning, which aims to map raw visual observation to action, achieves promising results in control tasks such as robotic manipulation and autonomous driving. However, it requires a huge number of online interactions with the training environment, which limits its real-world application. Compared to the popular unsupervised feature learning for visual recognition, feature pretraining for visuomotor control tasks is much less explored. In this work, we aim to pretrain policy representations for driving tasks by watching hours-long uncurated YouTube videos. Specifically, we train an inverse dynamic model with a small amount of labeled data and use it to predict action labels for all the YouTube video frames. A new contrastive policy pretraining method is then developed to learn action-conditioned features from the video frames with pseudo action labels. Experiments show that the resulting action-conditioned features obtain substantial improvements for the downstream reinforcement learning and imitation learning tasks, outperforming the weights pretrained from previous unsupervised learning methods and ImageNet pretrained weight. Code, model weights, and data are available at: https://metadriverse.github.io/ACO.

Qihang Zhang, Zhenghao Peng, Bolei Zhou• 2022

Related benchmarks

Task	Dataset	Result
Reach	Meta-World ML-1 (test)	Success Rate52	9
Autonomous Driving	CARLA Map 1 (Source)	Cumulative Reward2.27e+3	6
Autonomous Driving	CARLA Map 2 (Seen Target)	Sum of Rewards2.36e+3	6
Autonomous Driving	CARLA Map 2 (Unseen Target)	Cumulative Reward2.42e+3	6
Autonomous Driving	CARLA Map 1 (Unseen Target)	Cumulative Reward1.33e+3	6
Autonomous Driving	CARLA Map 1 (Seen Target)	Sum of Rewards1.55e+3	6
Autonomous Driving	CARLA Map 2 (Source)	Cumulative Reward2.27e+3	6
Object Goal Navigation	AI2THOR Source domains	Success Rate55	6
Object Goal Navigation	AI2THOR Seen target domains	Success Rate39.6	6
Object Goal Navigation	AI2THOR Unseen target domains	Success Rate35.8	6

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord