Value Prediction Network

About

This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are trained to make option-conditional predictions of future values (discounted sum of rewards) rather than of future observations. Our experimental results show that VPN has several advantages over both model-free and model-based baselines in a stochastic environment where careful planning is required but building an accurate observation-prediction model is difficult. Furthermore, VPN outperforms Deep Q-Network (DQN) on several Atari games even with short-lookahead planning, demonstrating its potential as a new way of learning a good state representation.

Junhyuk Oh, Satinder Singh, Honglak Lee• 2017

Related benchmarks

Task	Dataset	Result
Collect	Collect Stochastic Original	Average Reward8.11	6
Collect	Collect Stochastic MWs	Average Reward7.46	6
Reinforcement Learning	Procgen easy levels zero-shot generalization 16 games (test)	bigfish0.2969	6
Collect	Collect Stochastic, FGs	Average Reward4.45	6
Collect	Collect Deterministic Original	Average Reward9.29	6
Collect	Collect Deterministic, FGs	Average Reward5.43	6
Collect	Collect Deterministic MWs	Average Reward8.31	6
Atari Game Playing	Atari 2600 (test)	Frostbite3.81e+3	2

Showing 8 of 8 rows

Other info

Code

Follow for update

@wizwand_team Discord