Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Prediction and Control in Continual Reinforcement Learning

About

Temporal difference (TD) learning is often used to update the estimate of the value function which is used by RL agents to extract useful policies. In this paper, we focus on value function estimation in continual reinforcement learning. We propose to decompose the value function into two components which update at different timescales: a permanent value function, which holds general knowledge that persists over time, and a transient value function, which allows quick adaptation to new situations. We establish theoretical results showing that our approach is well suited for continual learning and draw connections to the complementary learning systems (CLS) theory from neuroscience. Empirically, this approach improves performance significantly on both prediction and control problems.

Nishanth Anand, Doina Precup• 2023

Related benchmarks

TaskDatasetResultRank
Continual LearningCW10 (sequence)
Performance9.3
27
Continual Reinforcement LearningMinAtar
Breakout Score10.71
6
Continual Reinforcement LearningMeta-World average over three sequences
Average Performance9.3
6
Robotic ManipulationMeta-World (averaged over 3 sequences)
Average Performance0.093
6
Continual Reinforcement LearningALE SpaceInvaders v5
Average Performance61
5
Continual Reinforcement LearningALE/Freeway v5
Average Performance21
5
Showing 6 of 6 rows

Other info

Follow for update