In-Context Compositional Q-Learning for Offline Reinforcement Learning

About

Accurate estimation of the Q-function is a central challenge in offline reinforcement learning. However, existing approaches often rely on a shared global Q-function, which is inadequate for capturing the compositional structure of tasks that consist of diverse subtasks. We propose In-context Compositional Q-Learning (ICQL), an offline RL framework that formulates Q-learning as a contextual inference problem and uses linear Transformers to adaptively infer local Q-functions from retrieved transitions without explicit subtask labels. Theoretically, we show that, under two assumptions -- linear approximability of the local Q-function and accurate inference of weights from retrieved context -- ICQL achieves a bounded approximation error for the Q-function and enables near-optimal policy extraction. Empirically, ICQL substantially improves performance in offline settings, achieving gains of up to 16.4% on kitchen tasks and up to 8.8% and 6.3% on MuJoCo and Adroit tasks, respectively. These results highlight the underexplored potential of in-context learning for robust and compositional value estimation and establish ICQL as a principled and effective framework for offline RL.

Qiushui Xu, Yuhao Huang, Yushu Jiang, Lei Song, Jinyu Wang, Wenliang Zheng, Jiang Bian• 2025

Related benchmarks

Task	Dataset	Result
Locomotion	D4RL MuJoCo Tasks	Average D4RL Locomotion Score (v2)80.6	29
Dexterous Manipulation	D4RL Adroit Tasks	--	12
Offline Reinforcement Learning	D4RL	Walker2d (Medium Expert) Score113.3	11
Goal-conditioned manipulation	D4RL Kitchen Tasks	Kitchen Complete v0 Success79.3	6

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord