Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Neurosymbolic Reinforcement Learning with Formally Verified Exploration

About

We present Revel, a partially neural reinforcement learning (RL) framework for provably safe exploration in continuous state and action spaces. A key challenge for provably safe deep RL is that repeatedly verifying neural networks within a learning loop is computationally infeasible. We address this challenge using two policy classes: a general, neurosymbolic class with approximate gradients and a more restricted class of symbolic policies that allows efficient verification. Our learning algorithm is a mirror descent over policies: in each iteration, it safely lifts a symbolic policy into the neurosymbolic space, performs safe gradient updates to the resulting policy, and projects the updated policy into the safe symbolic subset, all without requiring explicit verification of neural networks. Our empirical results show that Revel enforces safe exploration in many scenarios in which Constrained Policy Optimization does not, and that it can discover policies that outperform those learned through prior approaches to verified exploration.

Greg Anderson, Abhinav Verma, Isil Dillig, Swarat Chaudhuri• 2020

Related benchmarks

TaskDatasetResultRank
Reinforcement LearningST-mount-car
Mean Performance11.4
6
Reinforcement LearningST-obstacle2
Mean Score9.3
6
Reinforcement LearningST-obstacle
Mean Performance Score-41.6
6
Reinforcement LearningST-road
Mean Performance9.7
6
Reinforcement LearningST-road2d
Mean Score11.2
6
Safe Reinforcement Learningobstacle Static dynamics
Mean Shield Invocations per Episode4
3
Safe Reinforcement Learningobstacle2 Static dynamics
Mean Shield Invocations / Episode33.8
3
Safe Reinforcement Learningmountain-car Static dynamics
Mean Shield Invocations per Episode4
3
Safe Reinforcement Learningroad Static dynamics
Mean Shield Invocations per Episode0.8
3
Safe Reinforcement Learningroad2d Static dynamics
Shield Invocations / Episode (Mean)0.00e+0
3
Showing 10 of 10 rows

Other info

Follow for update