Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Proximal Policy Optimization with Evolutionary Mutations

About

Proximal Policy Optimization (PPO) is a widely used reinforcement learning algorithm known for its stability and sample efficiency, but it often suffers from premature convergence due to limited exploration. In this paper, we propose POEM (Proximal Policy Optimization with Evolutionary Mutations), a novel modification to PPO that introduces an adaptive exploration mechanism inspired by evolutionary algorithms. POEM enhances policy diversity by monitoring the Kullback-Leibler (KL) divergence between the current policy and a moving average of previous policies. When policy changes become minimal, indicating stagnation, POEM triggers an adaptive mutation of policy parameters to promote exploration. We evaluate POEM on four OpenAI Gym environments: CarRacing, MountainCar, BipedalWalker, and LunarLander. Through extensive fine-tuning using Bayesian optimization techniques and statistical testing using Welch's t-test, we find that POEM significantly outperforms PPO on three of the four tasks (BipedalWalker: t=-2.0642, p=0.0495; CarRacing: t=-6.3987, p=0.0002; MountainCar: t=-6.2431, p<0.0001), while performance on LunarLander is not statistically significant (t=-1.8707, p=0.0778). Our results highlight the potential of integrating evolutionary principles into policy gradient methods to overcome exploration-exploitation tradeoffs.

Casimir Czworkowski, Stephen Hornish, Alhassan S. Yasin• 2026

Related benchmarks

TaskDatasetResultRank
Reinforcement LearningBipedalWalker v3
Return180.6
2
Reinforcement LearningCarRacing v3
Average Agent Reward640
2
Reinforcement LearningMountainCarContinuous v0
Average Agent Reward93.52
2
Reinforcement LearningLunarLander v3
Average Agent Reward242.1
2
Control TaskOpenAI Gym BipedalWalker
T-statistic-2.0642
1
Control TaskOpenAI Gym CarRacing
T-statistic-6.3987
1
Control TaskOpenAI Gym MountainCar
T-statistic-6.2431
1
Control TaskOpenAI Gym LunarLander
T-statistic-1.8707
1
Showing 8 of 8 rows

Other info

Follow for update