Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

About

Pretraining on noisy, internet-scale datasets has been heavily studied as a technique for training models with broad, general capabilities for text, images, and other modalities. However, for many sequential decision domains such as robotics, video games, and computer use, publicly available data does not contain the labels required to train behavioral priors in the same way. We extend the internet-scale pretraining paradigm to sequential decision domains through semi-supervised imitation learning wherein agents learn to act by watching online unlabeled videos. Specifically, we show that with a small amount of labeled data we can train an inverse dynamics model accurate enough to label a huge unlabeled source of online data -- here, online videos of people playing Minecraft -- from which we can then train a general behavioral prior. Despite using the native human interface (mouse and keyboard at 20Hz), we show that this behavioral prior has nontrivial zero-shot capabilities and that it can be fine-tuned, with both imitation learning and reinforcement learning, to hard-exploration tasks that are impossible to learn from scratch via reinforcement learning. For many tasks our models exhibit human-level performance, and we are the first to report computer agents that can craft diamond tools, which can take proficient humans upwards of 20 minutes (24,000 environment actions) of gameplay to accomplish.

Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune• 2022

Related benchmarks

TaskDatasetResultRank
General Robot ManipulationSimplerEnv
Average Success Rate51
23
Craft ItemsMCU
Crafting Table Success Rate50
8
Kill EntitiesMCU
Pig Success Rate55
8
Smelt ItemsMCU
Furnace Success Rate10
8
Mine BlocksMCU
Log Success Rate15
8
Embodied TasksMCU All set
Steps377
6
Embodied TasksMCU Mini
SR10.1
6
Open-Ended Instruction Task ExecutionMinecraft Open-Ended Instruction Tasks (test)
Torch Success Rate11
6
Atomic TasksMineRL
Logs Collected2.6
6
Combat TasksMCU Mini
Success Rate3.6
6
Showing 10 of 13 rows

Other info

Follow for update