Proximal Policy Optimization with Evolutionary Mutations

About

Proximal Policy Optimization (PPO) is a widely used reinforcement learning algorithm known for its stability and sample efficiency, but it often suffers from premature convergence due to limited exploration. In this paper, we propose POEM (Proximal Policy Optimization with Evolutionary Mutations), a novel modification to PPO that introduces an adaptive exploration mechanism inspired by evolutionary algorithms. POEM enhances policy diversity by monitoring the Kullback-Leibler (KL) divergence between the current policy and a moving average of previous policies. When policy changes become minimal, indicating stagnation, POEM triggers an adaptive mutation of policy parameters to promote exploration. We evaluate POEM on four OpenAI Gym environments: CarRacing, MountainCar, BipedalWalker, and LunarLander. Through extensive fine-tuning using Bayesian optimization techniques and statistical testing using Welch's t-test, we find that POEM significantly outperforms PPO on three of the four tasks (BipedalWalker: t=-2.0642, p=0.0495; CarRacing: t=-6.3987, p=0.0002; MountainCar: t=-6.2431, p<0.0001), while performance on LunarLander is not statistically significant (t=-1.8707, p=0.0778). Our results highlight the potential of integrating evolutionary principles into policy gradient methods to overcome exploration-exploitation tradeoffs.

Casimir Czworkowski, Stephen Hornish, Alhassan S. Yasin• 2026

Related benchmarks

Task	Dataset	Result
Reinforcement Learning	MountainCarContinuous v0	Average Agent Reward93.52	65
Reinforcement Learning	LunarLander v3	Average Agent Reward242.1	14
Reinforcement Learning	BipedalWalker v3	Return180.6	6
Reinforcement Learning	CarRacing v3	Average Agent Reward640	2
Control Task	OpenAI Gym BipedalWalker	T-statistic-2.0642	1
Control Task	OpenAI Gym CarRacing	T-statistic-6.3987	1
Control Task	OpenAI Gym MountainCar	T-statistic-6.2431	1
Control Task	OpenAI Gym LunarLander	T-statistic-1.8707	1

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord