Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Decision Stacks: Flexible Reinforcement Learning via Modular Generative Models

About

Reinforcement learning presents an attractive paradigm to reason about several distinct aspects of sequential decision making, such as specifying complex goals, planning future observations and actions, and critiquing their utilities. However, the combined integration of these capabilities poses competing algorithmic challenges in retaining maximal expressivity while allowing for flexibility in modeling choices for efficient learning and inference. We present Decision Stacks, a generative framework that decomposes goal-conditioned policy agents into 3 generative modules. These modules simulate the temporal evolution of observations, rewards, and actions via independent generative models that can be learned in parallel via teacher forcing. Our framework guarantees both expressivity and flexibility in designing individual modules to account for key factors such as architectural bias, optimization objective and dynamics, transferrability across domains, and inference speed. Our empirical results demonstrate the effectiveness of Decision Stacks for offline policy optimization for several MDP and POMDP environments, outperforming existing methods and enabling flexible generative decision making.

Siyan Zhao, Aditya Grover• 2023

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningD4RL halfcheetah-medium-expert
Normalized Score95.7
117
Offline Reinforcement LearningD4RL hopper-medium-expert
Normalized Score110.9
115
Offline Reinforcement LearningD4RL walker2d-medium-expert
Normalized Score108
86
Offline Reinforcement LearningD4RL Medium-Replay Hopper
Normalized Score89.5
72
Offline Reinforcement LearningD4RL Medium HalfCheetah
Normalized Score47.8
59
Offline Reinforcement LearningD4RL Medium-Replay HalfCheetah
Normalized Score41.1
59
Offline Reinforcement LearningD4RL Medium Walker2d
Normalized Score83.6
58
Offline Reinforcement LearningD4RL Medium-Replay Walker2d
Normalized Score80.7
34
Offline Reinforcement LearningD4RL Medium Hopper
Normalized Score76.6
26
Offline Goal-Conditioned PlanningD4RL Maze2D Single Goal v0
Average Score131.5
14
Showing 10 of 17 rows

Other info

Code

Follow for update