Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Improving Regret Approximation for Unsupervised Dynamic Environment Generation

About

Unsupervised Environment Design (UED) seeks to automatically generate training curricula for reinforcement learning (RL) agents, with the goal of improving generalisation and zero-shot performance. However, designing effective curricula remains a difficult problem, particularly in settings where small subsets of environment parameterisations result in significant increases in the complexity of the required policy. Current methods struggle with a difficult credit assignment problem and rely on regret approximations that fail to identify challenging levels, both of which are compounded as the size of the environment grows. We propose Dynamic Environment Generation for UED (DEGen) to enable a denser level generator reward signal, reducing the difficulty of credit assignment and allowing for UED to scale to larger environment sizes. We also introduce a new regret approximation, Maximised Negative Advantage (MNA), as a significantly improved metric to optimise for, that better identifies more challenging levels. We show empirically that MNA outperforms current regret approximations and when combined with DEGen, consistently outperforms existing methods, especially as the size of the environment grows. We have made all our code available here: https://github.com/HarryMJMead/Dynamic-Environment-Generation-for-UED.

Harry Mead, Bruno Lacerda, Jakob Foerster, Nick Hawes• 2026

Related benchmarks

TaskDatasetResultRank
Puzzle SolvingSokoban Jr_1 Levels 1.0
Solve Rate49
5
Solve RateKey Minigrid 13x13
SixteenRooms Solve Rate100
5
Solve RateMiniGrid
SixteenRooms Solve Rate100
5
Grid-world NavigationKey Minigrid 17x17 zero-shot 1.0
Success Rate (SixteenRooms)100
3
Solve RateKey Minigrid 21x21
SixteenRooms Key100
3
Showing 5 of 5 rows

Other info

Follow for update