Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

State Chrono Representation for Enhancing Generalization in Reinforcement Learning

About

In reinforcement learning with image-based inputs, it is crucial to establish a robust and generalizable state representation. Recent advancements in metric learning, such as deep bisimulation metric approaches, have shown promising results in learning structured low-dimensional representation space from pixel observations, where the distance between states is measured based on task-relevant features. However, these approaches face challenges in demanding generalization tasks and scenarios with non-informative rewards. This is because they fail to capture sufficient long-term information in the learned representations. To address these challenges, we propose a novel State Chrono Representation (SCR) approach. SCR augments state metric-based representations by incorporating extensive temporal information into the update step of bisimulation metric learning. It learns state distances within a temporal framework that considers both future dynamics and cumulative rewards over current and long-term future states. Our learning strategy effectively incorporates future behavioral information into the representation space without introducing a significant number of additional parameters for modeling dynamics. Extensive experiments conducted in DeepMind Control and Meta-World environments demonstrate that SCR achieves better performance comparing to other recent metric-based methods in demanding generalization tasks. The codes of SCR are available in https://github.com/jianda-chen/SCR.

Jianda Chen, Wen Zheng Terence Ng, Zichen Chen, Sinno Jialin Pan, Tianwei Zhang• 2024

Related benchmarks

TaskDatasetResultRank
Continuous ControlDMControl 500k
Spin Score738.3
33
C-SwingUpSparseDM_Control
Average Return768.4
6
Ch-RunDM Control
Average Return778.4
6
Continuous ControlDM_Control distraction setting (test)
BiC-Catch Score221.3
6
F-SpinDM_Control
Average Return964.9
6
Robotic ManipulationMeta-World v2
Success Rate96.9
6
BiC-CatchDM_Control
Average Return944.2
6
C-SwingUpDM_Control
Average Return849.4
6
H-StandDM Control
Average Return851.3
6
R-EasyDM_Control
Average Return946.8
6
Showing 10 of 11 rows

Other info

Code

Follow for update