Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Value Prediction Network

About

This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are trained to make option-conditional predictions of future values (discounted sum of rewards) rather than of future observations. Our experimental results show that VPN has several advantages over both model-free and model-based baselines in a stochastic environment where careful planning is required but building an accurate observation-prediction model is difficult. Furthermore, VPN outperforms Deep Q-Network (DQN) on several Atari games even with short-lookahead planning, demonstrating its potential as a new way of learning a good state representation.

Junhyuk Oh, Satinder Singh, Honglak Lee• 2017

Related benchmarks

TaskDatasetResultRank
CollectCollect Stochastic Original
Average Reward8.11
6
CollectCollect Stochastic MWs
Average Reward7.46
6
Reinforcement LearningProcgen easy levels zero-shot generalization 16 games (test)
bigfish0.2969
6
CollectCollect Stochastic, FGs
Average Reward4.45
6
CollectCollect Deterministic Original
Average Reward9.29
6
CollectCollect Deterministic, FGs
Average Reward5.43
6
CollectCollect Deterministic MWs
Average Reward8.31
6
Atari Game PlayingAtari 2600 (test)
Frostbite3.81e+3
2
Showing 8 of 8 rows

Other info

Code

Follow for update