TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design
About
Generalizing deep reinforcement learning agents to unseen environments remains a significant challenge. One promising solution is Unsupervised Environment Design (UED), a co-evolutionary framework in which a teacher adaptively generates tasks with high learning potential, while a student learns a robust policy from this evolving curriculum. Existing UED methods typically measure learning potential via regret, the gap between optimal and current performance, approximated solely by value-function loss. Building on these approaches, we introduce the transition-prediction error as an additional term in our regret approximation. To capture how training on one task affects performance on others, we further propose a lightweight metric called Co-Learnability. By combining these two measures, we present Transition-aware Regret Approximation with Co-learnability for Environment Design (TRACED). Empirical evaluations show that TRACED produces curricula that improve zero-shot generalization over strong baselines across multiple benchmarks. Ablation studies confirm that the transition-prediction error drives rapid complexity ramp-up and that Co-Learnability delivers additional gains when paired with the transition-prediction error. These results demonstrate how refined regret approximation and explicit modeling of task relationships can be leveraged for sample-efficient curriculum design in UED. Project Page: https://geonwoo.me/traced/
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Locomotion | BipedalWalker Basic terrain | Mean Return293.7 | 11 | |
| Locomotion | BipedalWalker Hardcore terrain | Mean Return86.83 | 11 | |
| Locomotion | BipedalWalker Stump terrain | Mean Return34.16 | 11 | |
| Locomotion | BipedalWalker Overall Mean | Mean Return89.95 | 11 | |
| Locomotion | BipedalWalker Roughness terrain | Mean Return193.3 | 11 | |
| Locomotion | BipedalWalker PitGap terrain | Mean Return-39.26 | 11 | |
| Locomotion | BipedalWalker Stairs terrain | Mean Return-29 | 11 | |
| Maze Solving | PerfectMaze Large (held-out) | Solved Rate27 | 9 | |
| Solved Rate | BipedalWalker Zero-Shot (test) | Basic Solved Rate100 | 9 | |
| Maze Solving | PerfectMaze XL (held-out) | Solved Rate10 | 9 |