Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

A Survey of Zero-shot Generalisation in Deep Reinforcement Learning

About

The study of zero-shot generalisation (ZSG) in deep Reinforcement Learning (RL) aims to produce RL algorithms whose policies generalise well to novel unseen situations at deployment time, avoiding overfitting to their training environments. Tackling this is vital if we are to deploy reinforcement learning algorithms in real world scenarios, where the environment will be diverse, dynamic and unpredictable. This survey is an overview of this nascent field. We rely on a unifying formalism and terminology for discussing different ZSG problems, building upon previous works. We go on to categorise existing benchmarks for ZSG, as well as current methods for tackling these problems. Finally, we provide a critical discussion of the current state of the field, including recommendations for future work. Among other conclusions, we argue that taking a purely procedural content generation approach to benchmark design is not conducive to progress in ZSG, we suggest fast online adaptation and tackling RL-specific problems as some areas for future work on methods for ZSG, and we recommend building benchmarks in underexplored problem settings such as offline RL ZSG and reward-function variation.

Robert Kirk, Amy Zhang, Edward Grefenstette, Tim Rockt\"aschel• 2021

Related benchmarks

TaskDatasetResultRank
Traffic Signal ControlTraffic Signal Road Length variation
Normalized Reward0.9409
6
Advisory autonomyAdvisory Autonomy Single lane ring (Acceleration guidance)
Normalized Reward92.19
6
Advisory autonomyAdvisory Autonomy Single lane ring (Speed guidance)
Normalized Reward0.9688
6
Advisory autonomyAdvisory Autonomy Highway ramp (Acceleration guidance)
Normalized Reward0.5374
6
Advisory autonomyAdvisory Autonomy Highway ramp (Speed guidance)
Normalized Reward54.73
6
Dynamic eco-drivingEco-Driving Penetration Rate variation
Normalized Reward0.526
6
Dynamic eco-drivingEco-Driving Inflow variation
Normalized Reward0.4061
6
Dynamic eco-drivingEco-Driving Green Phase variation
Normalized Reward0.4228
6
Traffic Signal ControlTraffic Signal Inflow variation
Normalized Reward0.8646
6
Traffic Signal ControlTraffic Signal Speed Limit variation
Normalized Reward88.57
6
Showing 10 of 10 rows

Other info

Follow for update