Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Guiding Pretraining in Reinforcement Learning with Large Language Models

About

Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped reward function. Intrinsically motivated exploration methods address this limitation by rewarding agents for visiting novel states or transitions, but these methods offer limited benefits in large environments where most discovered novelty is irrelevant for downstream tasks. We describe a method that uses background knowledge from text corpora to shape exploration. This method, called ELLM (Exploring with LLMs) rewards an agent for achieving goals suggested by a language model prompted with a description of the agent's current state. By leveraging large-scale language model pretraining, ELLM guides agents toward human-meaningful and plausibly useful behaviors without requiring a human in the loop. We evaluate ELLM in the Crafter game environment and the Housekeep robotic simulator, showing that ELLM-trained agents have better coverage of common-sense behaviors during pretraining and usually match or improve performance on a range of downstream tasks. Code available at https://github.com/yuqingd/ellm.

Yuqing Du, Olivia Watkins, Zihan Wang, C\'edric Colas, Trevor Darrell, Pieter Abbeel, Abhishek Gupta, Jacob Andreas• 2023

Related benchmarks

TaskDatasetResultRank
Stack Green Block on Yellow BlockSimplerEnv WidowX+Bridge
Grasp Green Block Success Rate20.8
14
Average (SIMPLER Bridge Tasks)SIMPLER-Bridge
Success Rate1.7
10
Carrot2PlateSIMPLER Bridge tasks
Success Rate2.7
9
Eggplant2BaskSIMPLER Bridge tasks
Success Rate0.00e+0
9
Spoon2ClothSIMPLER-Bridge
Success Rate0.00e+0
9
Long-horizon manipulationLIBERO-10 Long
Success Rate0.00e+0
7
Robotic ControlRobotic Control Inference Benchmark
Inference Time (ms)2.40e+4
6
Showing 7 of 7 rows

Other info

Follow for update