DeepMDP: Learning Continuous Latent Space Models for Representation Learning
About
Many reinforcement learning (RL) tasks provide the agent with high-dimensional observations that can be simplified into low-dimensional continuous states. To formalize this process, we introduce the concept of a DeepMDP, a parameterized latent space model that is trained via the minimization of two tractable losses: prediction of rewards and prediction of the distribution over next latent states. We show that the optimization of these objectives guarantees (1) the quality of the latent space as a representation of the state space and (2) the quality of the DeepMDP as a model of the environment. We connect these results to prior work in the bisimulation literature, and explore the use of a variety of metrics. Our theoretical findings are substantiated by the experimental result that a trained DeepMDP recovers the latent structure underlying high-dimensional observations on a synthetic environment. Finally, we show that learning a DeepMDP as an auxiliary task in the Atari 2600 domain leads to large performance improvements over model-free RL.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Reinforcement Learning | CARLA (#GP scenario) | ER82 | 15 | |
| Autonomous Driving | CARLA (#HW) | Error Rate155 | 15 | |
| Visual Reinforcement Learning | CarRacing v0 (test) | Environment Reward3.56e+5 | 11 | |
| Driving Policy | CARLA JW scenario | Episode Reward146 | 7 | |
| Driving Policy | CARLA HB scenario | Episode Reward101 | 7 | |
| Driving Policy | CARLA HW scenario | Episode Reward182 | 7 | |
| Reinforcement Learning | Procgen easy levels zero-shot generalization 16 games (test) | bigfish-0.2969 | 6 | |
| Visual Reinforcement Learning | CARLA Scenario A (test) | ER186 | 6 | |
| Visual Reinforcement Learning | CARLA Scenario B (test) | Error Rate (ER)139 | 6 |