Playing Atari with Deep Reinforcement Learning
About
We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller• 2013
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Reinforcement Learning | CartPole v0 | Mean Score154.5 | 48 | |
| Single Asset Trading | TSLA (test) | CR %-1.296 | 24 | |
| Reinforcement Learning | Atari Breakout | Mean Return31 | 23 | |
| Reinforcement Learning | Atari Pong | Mean Episode Return-3 | 19 | |
| Game Playing | Atari 2600 (Arcade Learning Environment) v1 (test) | Alien Score6.88e+3 | 13 | |
| SIS | SIS Random Geometric Graphon, 40 agents | Mean Episode Reward-11.68 | 12 | |
| SIS | SIS Erdős Rényi Graphon, 40 agents | Mean Episode Reward-16.17 | 12 | |
| SIS | SIS Stochastic Block Graphon, 40 agents | Mean Episode Reward-15.52 | 12 | |
| Control Task | Lunar Lander (test) | Average Reward0.508 | 11 | |
| Reinforcement Learning | Atari 2600 Enduro | Mean Score368 | 10 |
Showing 10 of 79 rows
...