Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning

About

Reinforcement Learning (RL) can be considered as a sequence modeling task: given a sequence of past state-action-reward experiences, an agent predicts a sequence of next actions. In this work, we propose State-Action-Reward Transformer (StARformer) for visual RL, which explicitly models short-term state-action-reward representations (StAR-representations), essentially introducing a Markovian-like inductive bias to improve long-term modeling. Our approach first extracts StAR-representations by self-attending image state patches, action, and reward tokens within a short temporal window. These are then combined with pure image state representations -- extracted as convolutional features, to perform self-attention over the whole sequence. Our experiments show that StARformer outperforms the state-of-the-art Transformer-based method on image-based Atari and DeepMind Control Suite benchmarks, in both offline-RL and imitation learning settings. StARformer is also more compliant with longer sequences of inputs. Our code is available at https://github.com/elicassion/StARformer.

Jinghuan Shang, Kumara Kahatapitiya, Xiang Li, Michael S. Ryoo• 2021

Related benchmarks

TaskDatasetResultRank
LocomotionD4RL MuJoCo Tasks
Average D4RL Locomotion Score (v2)66.2
29
Dexterous ManipulationD4RL Adroit human cloned v2
Performance (Human Expert)77.9
10
Showing 2 of 2 rows

Other info

Follow for update