Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals

About

Unsupervised pre-training can equip reinforcement learning agents with prior knowledge and accelerate learning in downstream tasks. A promising direction, grounded in human development, investigates agents that learn by setting and pursuing their own goals. The core challenge lies in how to effectively generate, select, and learn from such goals. Our focus is on broad distributions of downstream tasks where solving every task zero-shot is infeasible. Such settings naturally arise when the target tasks lie outside of the pre-training distribution or when their identities are unknown to the agent. In this work, we (i) optimize for efficient multi-episode exploration and adaptation within a meta-learning framework, and (ii) guide the training curriculum with evolving estimates of the agent's post-adaptation performance. We present ULEE, an unsupervised meta-learning method that combines an in-context learner with an adversarial goal-generation strategy that maintains training at the frontier of the agent's capabilities. On XLand-MiniGrid benchmarks, ULEE pre-training yields improved exploration and adaptation abilities that generalize to novel objectives, environment dynamics, and map structures. The resulting policy attains improved zero-shot and few-shot performance, and provides a strong initialization for longer fine-tuning processes. It outperforms learning from scratch, DIAYN pre-training, and alternative curricula.

Octavio Pappalardo• 2026

Related benchmarks

TaskDatasetResultRank
BlockedUnlockPickUpMiniGrid
Mean Return0.43
5
DoorKey-16x16MiniGrid
Mean Return0.11
5
DoorKey-5x5MiniGrid
Mean Return0.27
5
FourRoomsMiniGrid
Mean Return0.16
5
LockedRoomMiniGrid
Mean Return0.01
5
MemoryS16MiniGrid
Mean Return0.51
5
MemoryS8MiniGrid
Mean Return0.52
5
Path planningMiniGrid DK-8
Reward0.24
5
UnlockMiniGrid
Mean Return0.75
5
UnlockPickUpMiniGrid
Mean Return0.68
5
Showing 10 of 14 rows

Other info

Follow for update