Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Efficient Cross-Domain Offline Reinforcement Learning with Dynamics- and Value-Aligned Data Filtering

About

Cross-domain offline reinforcement learning (RL) aims to train a well-performing agent in the target environment, leveraging both a limited target domain dataset and a source domain dataset with (possibly) sufficient data coverage. Due to the underlying dynamics misalignment between source and target domains, naively merging the two datasets may incur inferior performance. Recent advances address this issue by selectively leveraging source domain samples whose dynamics align well with the target domain. However, our work demonstrates that dynamics alignment alone is insufficient, by examining the limitations of prior frameworks and deriving a new target domain sub-optimality bound for the policy learned on the source domain. More importantly, our theory underscores an additional need for \textit{value alignment}, i.e., selecting high-quality, high-value samples from the source domain, a critical dimension overlooked by existing works. Motivated by such theoretical insight, we propose \textbf{\underline{D}}ynamics- and \textbf{\underline{V}}alue-aligned \textbf{\underline{D}}ata \textbf{\underline{F}}iltering (DVDF) method, a novel unified cross-domain RL framework that selectively incorporates source domain samples exhibiting strong alignment in \textit{both dynamics and values}. We empirically study a range of dynamics shift scenarios, including kinematic and morphology shifts, and evaluate DVDF on various tasks and datasets, even in the challenging setting where the target domain dataset contains an extremely limited amount of data. Extensive experiments demonstrate that DVDF consistently outperforms strong baselines with significant improvements.

Zhongjian Qiao, Rui Yang, Jiafei Lyu, Chenjia Bai, Xiu Li, Siyang Gao, Shuang Qiu• 2025

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement Learninghopper medium
Normalized Score20.3
52
Offline Reinforcement Learningwalker2d medium
Normalized Score24.3
51
Offline Reinforcement Learningwalker2d medium-replay
Normalized Score4.8
50
Offline Reinforcement Learninghopper medium-replay
Normalized Score7.4
44
Offline Reinforcement Learninghalfcheetah medium-replay
Normalized Score25.1
43
Offline Reinforcement Learninghalfcheetah medium
Normalized Score26.7
43
Offline Reinforcement LearningWalker2d medium-expert
Normalized Score23
31
Offline Reinforcement LearningHopper medium-expert
Normalized Score43.2
24
Offline Reinforcement LearningHopper expert
Normalized Score48.9
19
Offline Reinforcement LearningHalfcheetah medium-expert
Normalized Score21.9
15
Showing 10 of 53 rows

Other info

Follow for update