Prompting Decision Transformer for Few-Shot Policy Generalization

About

Humans can leverage prior experience and learn novel tasks from a handful of demonstrations. In contrast to offline meta-reinforcement learning, which aims to achieve quick adaptation through better algorithm design, we investigate the effect of architecture inductive bias on the few-shot learning capability. We propose a Prompt-based Decision Transformer (Prompt-DT), which leverages the sequential modeling ability of the Transformer architecture and the prompt framework to achieve few-shot adaptation in offline RL. We design the trajectory prompt, which contains segments of the few-shot demonstrations, and encodes task-specific information to guide policy generation. Our experiments in five MuJoCo control benchmarks show that Prompt-DT is a strong few-shot learner without any extra finetuning on unseen target tasks. Prompt-DT outperforms its variants and strong meta offline RL baselines by a large margin with a trajectory prompt containing only a few timesteps. Prompt-DT is also robust to prompt length changes and can generalize to out-of-distribution (OOD) environments.

Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua B. Tenenbaum, Chuang Gan• 2022

Related benchmarks

Task	Dataset	Result
Reinforcement Learning	HalfCheetah Vel	Average Episode Return-38.6	10
Behavior Cloning	DeepMind Control (DMC) suite seen/unseen embodiments	Hopper Hop Score0.9	9
Goal-oriented navigation	AGENT new concepts from new initial states (test)	Accuracy57	9
Multi-task reinforcement learning	Meta-World MT50 (MT50-rand) V2 (Near-optimal)	Avg Success Rate45.68	8
Driving	Driving (test)	Success Rate0.00e+0	8
Multi-task reinforcement learning	Meta-World MT50-rand V2 (Sub-optimal)	Average Success Rate39.76	6
Multi-task Robot Learning	Meta-World MT10	Success Rate99	5
Multi-task Robot Learning	Meta-World MT50	Success Rate97	5

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord