Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Prompting Decision Transformer for Few-Shot Policy Generalization

About

Humans can leverage prior experience and learn novel tasks from a handful of demonstrations. In contrast to offline meta-reinforcement learning, which aims to achieve quick adaptation through better algorithm design, we investigate the effect of architecture inductive bias on the few-shot learning capability. We propose a Prompt-based Decision Transformer (Prompt-DT), which leverages the sequential modeling ability of the Transformer architecture and the prompt framework to achieve few-shot adaptation in offline RL. We design the trajectory prompt, which contains segments of the few-shot demonstrations, and encodes task-specific information to guide policy generation. Our experiments in five MuJoCo control benchmarks show that Prompt-DT is a strong few-shot learner without any extra finetuning on unseen target tasks. Prompt-DT outperforms its variants and strong meta offline RL baselines by a large margin with a trajectory prompt containing only a few timesteps. Prompt-DT is also robust to prompt length changes and can generalize to out-of-distribution (OOD) environments.

Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua B. Tenenbaum, Chuang Gan• 2022

Related benchmarks

TaskDatasetResultRank
Behavior CloningDeepMind Control (DMC) suite seen/unseen embodiments
Hopper Hop Score0.9
9
Goal-oriented navigationAGENT new concepts from new initial states (test)
Accuracy57
9
Multi-task reinforcement learningMeta-World MT50 (MT50-rand) V2 (Near-optimal)
Avg Success Rate45.68
8
DrivingDriving (test)
Success Rate0.00e+0
8
Multi-task reinforcement learningMeta-World MT50-rand V2 (Sub-optimal)
Average Success Rate39.76
6
Multi-task Robot LearningMeta-World MT10
Success Rate99
5
Multi-task Robot LearningMeta-World MT50
Success Rate97
5
Showing 7 of 7 rows

Other info

Follow for update