Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning

About

Offline reinforcement learning (RL) aims to find a near-optimal policy using pre-collected datasets. In real-world scenarios, data collection could be costly and risky; therefore, offline RL becomes particularly challenging when the in-domain data is limited. Given recent advances in Large Language Models (LLMs) and their few-shot learning prowess, this paper introduces $\textbf{La}$nguage Models for $\textbf{Mo}$tion Control ($\textbf{LaMo}$), a general framework based on Decision Transformers to effectively use pre-trained Language Models (LMs) for offline RL. Our framework highlights four crucial components: (1) Initializing Decision Transformers with sequentially pre-trained LMs, (2) employing the LoRA fine-tuning method, in contrast to full-weight fine-tuning, to combine the pre-trained knowledge from LMs and in-domain knowledge effectively, (3) using the non-linear MLP transformation instead of linear projections, to generate embeddings, and (4) integrating an auxiliary language prediction loss during fine-tuning to stabilize the LMs and retain their original abilities on languages. Empirical results indicate $\textbf{LaMo}$ achieves excellent performance in sparse-reward tasks and closes the gap between value-based offline RL methods and decision transformers in dense-reward tasks. In particular, our method demonstrates superior performance in scenarios with limited data samples.

Ruizhe Shi, Yuyao Liu, Yanjie Ze, Simon S. Du, Huazhe Xu• 2023

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningD4RL hopper-expert v2
Normalized Score111.6
56
Offline Reinforcement LearningD4RL halfcheetah-expert v2
Normalized Score92
56
Offline Reinforcement LearningD4RL walker2d-expert v2
Normalized Score108.1
56
Offline Reinforcement LearningD4RL antmaze-umaze (diverse)
Normalized Score70
40
Offline Reinforcement LearningD4RL AntMaze-Umaze v0
Average Normalized Score80
5
Offline Reinforcement LearningD4RL Ant Medium-Replay v2
Normalized Score92.7
4
Offline Reinforcement LearningD4RL Ant Medium-Expert v2
Normalized Score134.8
4
Offline Reinforcement LearningD4RL Ant-Expert v2
Normalized Score134.2
4
Offline Reinforcement LearningD4RL Ant-Medium v2
Normalized Score94.6
4
Showing 9 of 9 rows

Other info

Follow for update