In-Context Compositional Q-Learning for Offline Reinforcement Learning
About
Accurate estimation of the Q-function is a central challenge in offline reinforcement learning. However, existing approaches often rely on a shared global Q-function, which is inadequate for capturing the compositional structure of tasks that consist of diverse subtasks. We propose In-context Compositional Q-Learning (ICQL), an offline RL framework that formulates Q-learning as a contextual inference problem and uses linear Transformers to adaptively infer local Q-functions from retrieved transitions without explicit subtask labels. Theoretically, we show that, under two assumptions -- linear approximability of the local Q-function and accurate inference of weights from retrieved context -- ICQL achieves a bounded approximation error for the Q-function and enables near-optimal policy extraction. Empirically, ICQL substantially improves performance in offline settings, achieving gains of up to 16.4% on kitchen tasks and up to 8.8% and 6.3% on MuJoCo and Adroit tasks, respectively. These results highlight the underexplored potential of in-context learning for robust and compositional value estimation and establish ICQL as a principled and effective framework for offline RL.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Offline Reinforcement Learning | D4RL | Walker2d (Medium Expert) Score113.3 | 11 | |
| Goal-conditioned manipulation | D4RL Kitchen Tasks | Kitchen Complete v0 Success79.3 | 6 | |
| Locomotion | D4RL MuJoCo Tasks | Walker2d Medium Expert v2 Score113.3 | 6 | |
| Dexterous Manipulation | D4RL Adroit Tasks | Pen Success Rate (Human)85.6 | 6 |