Domain Adaptation In Reinforcement Learning Via Latent Unified State Representation

About

Despite the recent success of deep reinforcement learning (RL), domain adaptation remains an open problem. Although the generalization ability of RL agents is critical for the real-world applicability of Deep RL, zero-shot policy transfer is still a challenging problem since even minor visual changes could make the trained agent completely fail in the new task. To address this issue, we propose a two-stage RL agent that first learns a latent unified state representation (LUSR) which is consistent across multiple domains in the first stage, and then do RL training in one source domain based on LUSR in the second stage. The cross-domain consistency of LUSR allows the policy acquired from the source domain to generalize to other target domains without extra training. We first demonstrate our approach in variants of CarRacing games with customized manipulations, and then verify it in CARLA, an autonomous driving simulator with more complex and realistic visual observations. Our results show that this approach can achieve state-of-the-art domain adaptation performance in related RL tasks and outperforms prior approaches based on latent-representation based RL and image-to-image translation.

Jinwei Xing, Takashi Nagata, Kexin Chen, Xinyun Zou, Emre Neftci, Jeffrey L. Krichmar• 2021

Related benchmarks

Task	Dataset	Result
Reach	Meta-World ML-1 (test)	Success Rate46	9
Autonomous Driving	CARLA Map 2 (Source)	Cumulative Reward2.28e+3	6
Reach	egocentric-Metaworld Source	Success Rate100	6
Reach-Wall	egocentric-Metaworld (Seen Target)	Success Rate33.3	6
Reach-Wall	egocentric-Metaworld (Unseen Target)	Success Rate30.7	6
Autonomous Driving	CARLA Map 1 (Source)	Cumulative Reward2.14e+3	6
Autonomous Driving	CARLA Map 1 (Unseen Target)	Cumulative Reward1.07e+3	6
Autonomous Driving	CARLA Map 2 (Seen Target)	Sum of Rewards1.17e+3	6
Object Goal Navigation	AI2THOR Source domains	Success Rate53.3	6
Object Goal Navigation	AI2THOR Seen target domains	Success Rate21.3	6

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord