Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MICo: Improved representations via sampling-based state similarity for Markov decision processes

About

We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effective means of shaping the learnt representations of deep reinforcement learning agents. While existing notions of state similarity are typically difficult to learn at scale due to high computational cost and lack of sample-based algorithms, our newly-proposed distance addresses both of these issues. In addition to providing detailed theoretical analysis, we provide empirical evidence that learning this distance alongside the value function yields structured and informative representations, including strong results on the Arcade Learning Environment benchmark.

Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland• 2021

Related benchmarks

TaskDatasetResultRank
Continuous ControlDMControl 500k
Spin Score86.9
33
Visual Offline Reinforcement LearningV-D4RL (various)
Cheetah-Run Medium177
8
H-StandDM Control
Average Return800.8
6
C-SwingUpDM_Control
Average Return803.2
6
Continuous ControlDM_Control distraction setting (test)
BiC-Catch Score104.2
6
R-EasyDM_Control
Average Return186.1
6
Robotic ManipulationMeta-World v2
Success Rate49.5
6
BiC-CatchDM_Control
Average Return215.4
6
C-SwingUpSparseDM_Control
Average Return0.00e+0
6
Ch-RunDM Control
Average Return4.9
6
Showing 10 of 12 rows

Other info

Follow for update