Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Decision ConvFormer: Local Filtering in MetaFormer is Sufficient for Decision Making

About

The recent success of Transformer in natural language processing has sparked its use in various domains. In offline reinforcement learning (RL), Decision Transformer (DT) is emerging as a promising model based on Transformer. However, we discovered that the attention module of DT is not appropriate to capture the inherent local dependence pattern in trajectories of RL modeled as a Markov decision process. To overcome the limitations of DT, we propose a novel action sequence predictor, named Decision ConvFormer (DC), based on the architecture of MetaFormer, which is a general structure to process multiple entities in parallel and understand the interrelationship among the multiple entities. DC employs local convolution filtering as the token mixer and can effectively capture the inherent local associations of the RL dataset. In extensive experiments, DC achieved state-of-the-art performance across various standard RL benchmarks while requiring fewer resources. Furthermore, we show that DC better understands the underlying meaning in data and exhibits enhanced generalization capability.

Jeonghye Kim, Suyoung Lee, Woojun Kim, Youngchul Sung• 2023

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningD4RL halfcheetah-medium-expert
Normalized Score93.1
169
Offline Reinforcement LearningD4RL hopper-medium-expert
Normalized Score110.1
161
Offline Reinforcement LearningD4RL walker2d-medium-expert
Normalized Score109.2
132
Offline Reinforcement LearningD4RL Medium-Replay Hopper
Normalized Score87.8
109
Offline Reinforcement LearningD4RL Medium HalfCheetah
Normalized Score43
105
Offline Reinforcement LearningD4RL Medium Walker2d
Normalized Score79.6
104
Offline Reinforcement LearningD4RL Medium-Replay HalfCheetah
Normalized Score41.1
97
Offline Reinforcement Learninghopper medium
Normalized Score92.5
68
Offline Reinforcement LearningD4RL walker2d medium-replay
Normalized Score77.1
62
Offline Reinforcement Learningwalker2d medium-replay
Normalized Score76.6
61
Showing 10 of 59 rows

Other info

Follow for update