Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

In-Context Compositional Q-Learning for Offline Reinforcement Learning

About

Accurate estimation of the Q-function is a central challenge in offline reinforcement learning. However, existing approaches often rely on a shared global Q-function, which is inadequate for capturing the compositional structure of tasks that consist of diverse subtasks. We propose In-context Compositional Q-Learning (ICQL), an offline RL framework that formulates Q-learning as a contextual inference problem and uses linear Transformers to adaptively infer local Q-functions from retrieved transitions without explicit subtask labels. Theoretically, we show that, under two assumptions -- linear approximability of the local Q-function and accurate inference of weights from retrieved context -- ICQL achieves a bounded approximation error for the Q-function and enables near-optimal policy extraction. Empirically, ICQL substantially improves performance in offline settings, achieving gains of up to 16.4% on kitchen tasks and up to 8.8% and 6.3% on MuJoCo and Adroit tasks, respectively. These results highlight the underexplored potential of in-context learning for robust and compositional value estimation and establish ICQL as a principled and effective framework for offline RL.

Qiushui Xu, Yuhao Huang, Yushu Jiang, Lei Song, Jinyu Wang, Wenliang Zheng, Jiang Bian• 2025

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningD4RL
Walker2d (Medium Expert) Score113.3
11
Goal-conditioned manipulationD4RL Kitchen Tasks
Kitchen Complete v0 Success79.3
6
LocomotionD4RL MuJoCo Tasks
Walker2d Medium Expert v2 Score113.3
6
Dexterous ManipulationD4RL Adroit Tasks
Pen Success Rate (Human)85.6
6
Showing 4 of 4 rows

Other info

Follow for update