Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Train Hard, Fight Easy: Robust Meta Reinforcement Learning

About

A major challenge of reinforcement learning (RL) in real-world applications is the variation between environments, tasks or clients. Meta-RL (MRL) addresses this issue by learning a meta-policy that adapts to new tasks. Standard MRL methods optimize the average return over tasks, but often suffer from poor results in tasks of high risk or difficulty. This limits system reliability since test tasks are not known in advance. In this work, we define a robust MRL objective with a controlled robustness level. Optimization of analogous robust objectives in RL is known to lead to both *biased gradients* and *data inefficiency*. We prove that the gradient bias disappears in our proposed MRL framework. The data inefficiency is addressed via the novel Robust Meta RL algorithm (RoML). RoML is a meta-algorithm that generates a robust version of any given MRL algorithm, by identifying and over-sampling harder tasks throughout training. We demonstrate that RoML achieves robust returns on multiple navigation and continuous control benchmarks.

Ido Greenberg, Shie Mannor, Gal Chechik, Eli Meirom• 2023

Related benchmarks

TaskDatasetResultRank
Continuous ControlMuJoCo HalfCheetah Vel (test)
Mean Return-95
9
Meta-Reinforcement LearningMuJoCo HalfCheetah Velocity variation (test)
CVaR 0.05 Return-184
7
Meta-Reinforcement LearningMuJoCo HalfCheetah Mass variation (test)
CVaR 0.05 Return1.26e+3
7
Meta-Reinforcement LearningMuJoCo HalfCheetah Body variation (test)
CVaR 0.05 Return935
7
Meta-Reinforcement LearningMuJoCo HalfCheetah 10D-task (a) (test)
CVaR 0.05 Return1.23e+3
7
Meta-Reinforcement LearningMuJoCo HalfCheetah 10D-task (b) (test)
CVaR0.05 Return1.70e+3
7
Meta-Reinforcement LearningMuJoCo HalfCheetah 10D-task (c) (test)
CVaR 0.05 Return1.02e+3
7
Continuous ControlMuJoCo HalfCheetah Mass (test)
Mean Return1.58e+3
7
Continuous ControlMuJoCo HalfCheetah 10D-task (b)
Mean Return1.95e+3
7
Continuous ControlMuJoCo HalfCheetah 10D-task (c)
Mean Return1.62e+3
7
Showing 10 of 24 rows

Other info

Follow for update