Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks

About

Meta reinforcement learning aims to develop policies that generalize to unseen tasks sampled from a task distribution. While context-based meta-RL methods improve task representation using task latents, they often struggle with out-of-distribution (OOD) tasks. To address this, we propose Task-Aware Virtual Training (TAVT), a novel algorithm that accurately captures task characteristics for both training and OOD scenarios using metric-based representation learning. Our method successfully preserves task characteristics in virtual tasks and employs a state regularization technique to mitigate overestimation errors in state-varying environments. Numerical results demonstrate that TAVT significantly enhances generalization to OOD tasks across various MuJoCo and MetaWorld environments. Our code is available at https://github.com/JM-Kim-94/tavt.git.

Jeongmo Kim, Yisak Park, Minung Kim, Seungyul Han• 2025

Related benchmarks

TaskDatasetResultRank
PushMeta-World ML-1 (test)
Success Rate0.98
12
PushMetaWorld ML1 Push-OOD-Extra (extrapolation)
Average Success Rate92
9
ReachMetaWorld ML1 Reach
Average Success Rate98
9
ReachMetaWorld ML1 Reach-OOD (interpolation)
Average Success Rate96
9
ReachMetaWorld ML1 Reach-OOD-Extra (extrapolation)
Success Rate99
9
PushMetaWorld ML1 Push OOD (interpolation)
Average Success Rate98
9
Showing 6 of 6 rows

Other info

Follow for update