Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Contrastive Representation for Data Filtering in Cross-Domain Offline Reinforcement Learning

About

Cross-domain offline reinforcement learning leverages source domain data with diverse transition dynamics to alleviate the data requirement for the target domain. However, simply merging the data of two domains leads to performance degradation due to the dynamics mismatch. Existing methods address this problem by measuring the dynamics gap via domain classifiers while relying on the assumptions of the transferability of paired domains. In this paper, we propose a novel representation-based approach to measure the domain gap, where the representation is learned through a contrastive objective by sampling transitions from different domains. We show that such an objective recovers the mutual-information gap of transition functions in two domains without suffering from the unbounded issue of the dynamics gap in handling significantly different domains. Based on the representations, we introduce a data filtering algorithm that selectively shares transitions from the source domain according to the contrastive score functions. Empirical results on various tasks demonstrate that our method achieves superior performance, using only 10% of the target data to achieve 89.2% of the performance on 100% target dataset with state-of-the-art methods.

Xiaoyu Wen, Chenjia Bai, Kang Xu, Xudong Yu, Yang Zhang, Xuelong Li, Zhen Wang• 2024

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningD4RL halfcheetah-medium-expert
Normalized Score61.9
117
Offline Reinforcement LearningD4RL hopper-medium-expert
Normalized Score43.3
115
Offline Reinforcement LearningD4RL Medium-Replay Hopper
Normalized Score54.9
72
Offline Reinforcement LearningD4RL Walker2d Medium v2
Normalized Return51.8
67
Offline Reinforcement LearningD4RL Medium HalfCheetah
Normalized Score45.5
59
Offline Reinforcement LearningD4RL Medium-Replay HalfCheetah
Normalized Score24.2
59
Offline Reinforcement LearningD4RL halfcheetah v2 (medium-replay)
Normalized Score22.9
58
Offline Reinforcement LearningD4RL Medium Walker2d
Normalized Score33
58
Offline Reinforcement LearningD4RL walker2d-expert v2
Normalized Score93.7
56
Offline Reinforcement LearningD4RL hopper-expert v2
Normalized Score70.1
56
Showing 10 of 150 rows
...

Other info

Follow for update