Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design

About

Generalizing deep reinforcement learning agents to unseen environments remains a significant challenge. One promising solution is Unsupervised Environment Design (UED), a co-evolutionary framework in which a teacher adaptively generates tasks with high learning potential, while a student learns a robust policy from this evolving curriculum. Existing UED methods typically measure learning potential via regret, the gap between optimal and current performance, approximated solely by value-function loss. Building on these approaches, we introduce the transition-prediction error as an additional term in our regret approximation. To capture how training on one task affects performance on others, we further propose a lightweight metric called Co-Learnability. By combining these two measures, we present Transition-aware Regret Approximation with Co-learnability for Environment Design (TRACED). Empirical evaluations show that TRACED produces curricula that improve zero-shot generalization over strong baselines across multiple benchmarks. Ablation studies confirm that the transition-prediction error drives rapid complexity ramp-up and that Co-Learnability delivers additional gains when paired with the transition-prediction error. These results demonstrate how refined regret approximation and explicit modeling of task relationships can be leveraged for sample-efficient curriculum design in UED. Project Page: https://geonwoo.me/traced/

Geonwoo Cho, Jaegyun Im, Jihwan Lee, Hojun Yi, Sejin Kim, Sundong Kim• 2025

Related benchmarks

TaskDatasetResultRank
LocomotionBipedalWalker Basic terrain
Mean Return293.7
11
LocomotionBipedalWalker Hardcore terrain
Mean Return86.83
11
LocomotionBipedalWalker Stump terrain
Mean Return34.16
11
LocomotionBipedalWalker Overall Mean
Mean Return89.95
11
LocomotionBipedalWalker Roughness terrain
Mean Return193.3
11
LocomotionBipedalWalker PitGap terrain
Mean Return-39.26
11
LocomotionBipedalWalker Stairs terrain
Mean Return-29
11
Maze SolvingPerfectMaze Large (held-out)
Solved Rate27
9
Solved RateBipedalWalker Zero-Shot (test)
Basic Solved Rate100
9
Maze SolvingPerfectMaze XL (held-out)
Solved Rate10
9
Showing 10 of 15 rows

Other info

Follow for update