Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents

About

Recent studies have uncovered the potential of Large Language Models (LLMs) in addressing complex sequential decision-making tasks through the provision of high-level instructions. However, LLM-based agents lack specialization in tackling specific target problems, particularly in real-time dynamic environments. Additionally, deploying an LLM-based agent in practical scenarios can be both costly and time-consuming. On the other hand, reinforcement learning (RL) approaches train agents that specialize in the target task but often suffer from low sampling efficiency and high exploration costs. In this paper, we introduce a novel framework that addresses these challenges by training a smaller, specialized student RL agent using instructions from an LLM-based teacher agent. By incorporating the guidance from the teacher agent, the student agent can distill the prior knowledge of the LLM into its own model. Consequently, the student agent can be trained with significantly less data. Moreover, through further training with environment feedback, the student agent surpasses the capabilities of its teacher for completing the target task. We conducted experiments on challenging MiniGrid and Habitat environments, specifically designed for embodied AI research, to evaluate the effectiveness of our framework. The results clearly demonstrate that our approach achieves superior performance compared to strong baseline methods. Our code is available at https://github.com/ZJLAB-AMMI/LLM4Teach.

Zihao Zhou, Bin Hu, Chenyang Zhao, Pu Zhang, Bin Liu• 2023

Related benchmarks

TaskDatasetResultRank
Visual Reinforcement LearningDMControl Cartpole, Swingup
Episode Return820
16
Visual Reinforcement LearningDMControl Walker Walk
Episode Return696
16
Visual Reinforcement LearningDMControl Ball in cup, Catch
Episode Return905
16
Visual Reinforcement LearningDMControl Cheetah Run
Episode Return367
16
Visual Reinforcement LearningDMControl Finger, Spin
Episode Return839
16
Visual Reinforcement LearningDMControl Reacher Easy
Episode Return292
16
Visual Reinforcement LearningCARLA (#GP scenario)
ER107
15
Autonomous DrivingCARLA (#HW)
Error Rate146
15
Visual Reinforcement LearningCarRacing v0 (test)
Environment Reward1.57e+5
11
Multi-robot long-horizon planningMAT-THOR Basic
TCR (%)36
6
Showing 10 of 16 rows

Other info

Follow for update