Value Prediction Network
About
This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are trained to make option-conditional predictions of future values (discounted sum of rewards) rather than of future observations. Our experimental results show that VPN has several advantages over both model-free and model-based baselines in a stochastic environment where careful planning is required but building an accurate observation-prediction model is difficult. Furthermore, VPN outperforms Deep Q-Network (DQN) on several Atari games even with short-lookahead planning, demonstrating its potential as a new way of learning a good state representation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Collect | Collect Stochastic Original | Average Reward8.11 | 6 | |
| Collect | Collect Stochastic MWs | Average Reward7.46 | 6 | |
| Reinforcement Learning | Procgen easy levels zero-shot generalization 16 games (test) | bigfish0.2969 | 6 | |
| Collect | Collect Stochastic, FGs | Average Reward4.45 | 6 | |
| Collect | Collect Deterministic Original | Average Reward9.29 | 6 | |
| Collect | Collect Deterministic, FGs | Average Reward5.43 | 6 | |
| Collect | Collect Deterministic MWs | Average Reward8.31 | 6 | |
| Atari Game Playing | Atari 2600 (test) | Frostbite3.81e+3 | 2 |