StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning

About

Reinforcement Learning (RL) can be considered as a sequence modeling task: given a sequence of past state-action-reward experiences, an agent predicts a sequence of next actions. In this work, we propose State-Action-Reward Transformer (StARformer) for visual RL, which explicitly models short-term state-action-reward representations (StAR-representations), essentially introducing a Markovian-like inductive bias to improve long-term modeling. Our approach first extracts StAR-representations by self-attending image state patches, action, and reward tokens within a short temporal window. These are then combined with pure image state representations -- extracted as convolutional features, to perform self-attention over the whole sequence. Our experiments show that StARformer outperforms the state-of-the-art Transformer-based method on image-based Atari and DeepMind Control Suite benchmarks, in both offline-RL and imitation learning settings. StARformer is also more compliant with longer sequences of inputs. Our code is available at https://github.com/elicassion/StARformer.

Jinghuan Shang, Kumara Kahatapitiya, Xiang Li, Michael S. Ryoo• 2021

Related benchmarks

Task	Dataset	Result	Rank
Locomotion	D4RL MuJoCo Tasks	Average D4RL Locomotion Score (v2)66.2		29
Dexterous Manipulation	D4RL Adroit human cloned v2	Performance (Human Expert)77.9		10

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord