Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Adversarial Environment Design via Regret-Guided Diffusion Models

About

Training agents that are robust to environmental changes remains a significant challenge in deep reinforcement learning (RL). Unsupervised environment design (UED) has recently emerged to address this issue by generating a set of training environments tailored to the agent's capabilities. While prior works demonstrate that UED has the potential to learn a robust policy, their performance is constrained by the capabilities of the environment generation. To this end, we propose a novel UED algorithm, adversarial environment design via regret-guided diffusion models (ADD). The proposed method guides the diffusion-based environment generator with the regret of the agent to produce environments that the agent finds challenging but conducive to further improvement. By exploiting the representation power of diffusion models, ADD can directly generate adversarial environments while maintaining the diversity of training environments, enabling the agent to effectively learn a robust policy. Our experimental results demonstrate that the proposed method successfully generates an instructive curriculum of environments, outperforming UED baselines in zero-shot generalization across novel, out-of-distribution environments. Project page: https://rllab-snu.github.io/projects/ADD

Hojun Chung, Junseo Lee, Minsoo Kim, Dohyeong Kim, Songhwai Oh• 2024

Related benchmarks

TaskDatasetResultRank
NavigationMiniWorld FourRooms
Success Rate61
15
2D bipedal locomotionBasic (OpenAI Gym) (test)
Average Return312
6
2D bipedal locomotionHardcore (OpenAI Gym) (test)
Average Return140.1
6
2D bipedal locomotionStairs (test)
Average Return75.4
6
2D bipedal locomotionPitGap (test)
Average Return143.2
6
2D bipedal locomotionStump (test)
Average Return58.2
6
2D bipedal locomotionRoughness (test)
Average Return168.9
6
Partially observable navigationMinigrid 16Rooms2
Solved Rate100
6
Partially observable navigationMinigrid Labyrinth
Solved Rate100
6
Partially observable navigationMinigrid Labyrinth2
Solved Rate97
6
Showing 10 of 18 rows

Other info

Code

Follow for update