Evolution Strategies as a Scalable Alternative to Reinforcement Learning

About

We explore the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients. Experiments on MuJoCo and Atari show that ES is a viable solution strategy that scales extremely well with the number of CPUs available: By using a novel communication strategy based on common random numbers, our ES implementation only needs to communicate scalars, making it possible to scale to over a thousand parallel workers. This allows us to solve 3D humanoid walking in 10 minutes and obtain competitive results on most Atari games after one hour of training. In addition, we highlight several advantages of ES as a black box optimization technique: it is invariant to action frequency and delayed rewards, tolerant of extremely long horizons, and does not need temporal discounting or value function approximation.

Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, Ilya Sutskever• 2017

Related benchmarks

Task	Dataset	Result
Reinforcement Learning	LunarLanderContinuous v2	Mean Reward115	65
Reinforcement Learning	Atari 2600 MONTEZUMA'S REVENGE	Score0.00e+0	45
Reinforcement Learning	HalfCheetah v3	Mean Reward2.42e+3	34
Reinforcement Learning	InvertedPendulum v2	Mean Reward651.9	27
Global Optimization	F2 benchmark function	Final Error0.012	25
Continuous Control	Humanoid 17-Dof	Final Return1.25e+4	21
Reinforcement Learning	Atari 2600 Qbert	Score147.5	20
Continuous Control	Hopper 3-Dof	Final Return2.56e+3	18
Reinforcement Learning	Swimmer v3	Mean Reward318.4	15
Global Optimization	F5 benchmark function	Final Error0.0012	14

Showing 10 of 48 rows

Other info

Follow for update

@wizwand_team Discord