Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Contextual Latent World Models for Offline Meta Reinforcement Learning

About

Offline meta-reinforcement learning seeks to learn policies that generalize across related tasks from fixed datasets. Context-based methods infer a task representation from transition histories, but learning effective task representations without supervision remains a challenge. In parallel, latent world models have demonstrated strong self-supervised representation learning through temporal consistency. We introduce contextual latent world models, which condition latent world models on inferred task representations and train them jointly with the context encoder. This enforces task-conditioned temporal consistency, yielding task representations that capture task-dependent dynamics rather than merely discriminating between tasks. Our method learns more expressive task representations and significantly improves generalization to unseen tasks across MuJoCo, Contextual-DeepMind Control, and Meta-World benchmarks.

Mohammadreza Nakheai, Aidan Scannell, Kevin Luck, Joni Pajarinen• 2026

Related benchmarks

TaskDatasetResultRank
Ant-dirMuJoCo in-distribution
Average Return726.7
6
Cheetah-LSContextual-DMC (in-distribution)
Average Return935
6
Cheetah-speedContextual-DMC (in-distribution)
Average Return706.4
6
Finger-LSContextual-DMC (in-distribution)
Average Return972
6
Finger-speedContextual-DMC (in-distribution)
Average Return943.3
6
Hopper-massMuJoCo in-distribution
Average Return566
6
Meta-Reinforcement LearningMeta-World in-distribution v2 (test)
Assembly Success Rate0.00e+0
6
Offline Meta-Reinforcement LearningMuJoCo Ant-dir In-distribution
Average Return863.1
6
Offline Meta-Reinforcement LearningMuJoCo Cheetah-LS In-distribution
Average Return944.8
6
Offline Meta-Reinforcement LearningMuJoCo Cheetah-speed In-distribution
Average Return751.2
6
Showing 10 of 28 rows

Other info

Follow for update