Decoupling Return-to-Go for Efficient Decision Transformer
About
The Decision Transformer (DT) has established a powerful sequence modeling approach to offline reinforcement learning. It conditions its action predictions on Return-to-Go (RTG), using it both to distinguish trajectory quality during training and to guide action generation at inference. In this work, we identify a critical redundancy in this design: feeding the entire sequence of RTGs into the Transformer is theoretically unnecessary, as only the most recent RTG affects action prediction. We show that this redundancy can impair DT's performance through experiments. To resolve this, we propose the Decoupled DT (DDT). DDT simplifies the architecture by processing only observation and action sequences through the Transformer, using the latest RTG to guide the action prediction. This streamlined approach not only improves performance but also reduces computational cost. Our experiments show that DDT significantly outperforms DT and establishes competitive performance against state-of-the-art DT variants across multiple offline RL tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Offline Reinforcement Learning | D4RL halfcheetah v2 (medium-replay) | Normalized Score37.8 | 58 | |
| Offline Reinforcement Learning | D4RL walker-medium-replay v2 (test) | Normalized Reward77.6 | 16 | |
| Offline Reinforcement Learning | D4RL Walker-medium-expert v2 | Normalized Return109.5 | 16 | |
| Offline Reinforcement Learning | D4RL hopper-medium v2 (test) | Normalized Reward99.4 | 8 | |
| Offline Reinforcement Learning | D4RL halfcheetah-medium-expert v2 (test) | Normalized Reward94.2 | 8 | |
| Offline Reinforcement Learning | D4RL hopper-medium-expert v2 (test) | Normalized Reward1.11e+4 | 8 | |
| Offline Reinforcement Learning | D4RL hopper-medium-replay v2 (test) | Normalized Reward9.25e+3 | 8 | |
| Offline Reinforcement Learning | D4RL halfcheetah medium v2 (test) | Normalized Reward43 | 8 |