Model-Based Offline Planning

About

Offline learning is a key part of making reinforcement learning (RL) useable in real systems. Offline RL looks at scenarios where there is data from a system's operation, but no direct access to the system when learning a policy. Recent work on training RL policies from offline data has shown results both with model-free policies learned directly from the data, or with planning on top of learnt models of the data. Model-free policies tend to be more performant, but are more opaque, harder to command externally, and less easy to integrate into larger systems. We propose an offline learner that generates a model that can be used to control the system directly through planning. This allows us to have easily controllable policies directly from data, without ever interacting with the system. We show the performance of our algorithm, Model-Based Offline Planning (MBOP) on a series of robotics-inspired tasks, and demonstrate its ability leverage planning to respect environmental constraints. We are able to find near-optimal polices for certain simulated systems from as little as 50 seconds of real-time system interaction, and create zero-shot goal-conditioned policies on a series of environments. An accompanying video can be found here: https://youtu.be/nxGGHdZOFts

Arthur Argenson, Gabriel Dulac-Arnold• 2020

Related benchmarks

Task	Dataset	Result
Offline Reinforcement Learning	D4RL halfcheetah-medium-expert	Normalized Score105.9	169
Offline Reinforcement Learning	D4RL hopper-medium-expert	Normalized Score55.1	161
Offline Reinforcement Learning	D4RL walker2d-medium-expert	Normalized Score70.2	132
Offline Reinforcement Learning	D4RL Medium HalfCheetah	Normalized Score44.6	105
Offline Reinforcement Learning	D4RL Medium Walker2d	Normalized Score41	104
Offline Reinforcement Learning	D4RL Walker2d Medium v2	Normalized Return41	85
Offline Reinforcement Learning	D4RL Medium Hopper	Normalized Score48.8	72
Offline Reinforcement Learning	D4RL HalfCheetah Medium v2	Average Normalized Return44.6	53
Offline Reinforcement Learning	D4RL Gym walker2d medium-expert	Normalized Average Return70.2	43
Offline Reinforcement Learning	D4RL Hopper Medium v2	Normalized Return48.8	43

Showing 10 of 23 rows

Other info

Follow for update

@wizwand_team Discord