Efficient Cross-Domain Offline Reinforcement Learning with Dynamics- and Value-Aligned Data Filtering

About

Cross-domain offline reinforcement learning (RL) aims to train a well-performing agent in the target environment, leveraging both a limited target domain dataset and a source domain dataset with (possibly) sufficient data coverage. Due to the underlying dynamics misalignment between source and target domains, naively merging the two datasets may incur inferior performance. Recent advances address this issue by selectively leveraging source domain samples whose dynamics align well with the target domain. However, our work demonstrates that dynamics alignment alone is insufficient, by examining the limitations of prior frameworks and deriving a new target domain sub-optimality bound for the policy learned on the source domain. More importantly, our theory underscores an additional need for \textit{value alignment}, i.e., selecting high-quality, high-value samples from the source domain, a critical dimension overlooked by existing works. Motivated by such theoretical insight, we propose \textbf{\underline{D}}ynamics- and \textbf{\underline{V}}alue-aligned \textbf{\underline{D}}ata \textbf{\underline{F}}iltering (DVDF) method, a novel unified cross-domain RL framework that selectively incorporates source domain samples exhibiting strong alignment in \textit{both dynamics and values}. We empirically study a range of dynamics shift scenarios, including kinematic and morphology shifts, and evaluate DVDF on various tasks and datasets, even in the challenging setting where the target domain dataset contains an extremely limited amount of data. Extensive experiments demonstrate that DVDF consistently outperforms strong baselines with significant improvements.

Zhongjian Qiao, Rui Yang, Jiafei Lyu, Chenjia Bai, Xiu Li, Siyang Gao, Shuang Qiu• 2025

Related benchmarks

Task	Dataset	Result
Offline Reinforcement Learning	hopper medium	Normalized Score20.3	68
Offline Reinforcement Learning	walker2d medium	Normalized Score24.3	61
Offline Reinforcement Learning	walker2d medium-replay	Normalized Score4.8	61
Offline Reinforcement Learning	hopper medium-replay	Normalized Score7.4	55
Offline Reinforcement Learning	halfcheetah medium-replay	Normalized Score25.1	54
Offline Reinforcement Learning	halfcheetah medium	Normalized Score26.7	53
Offline Reinforcement Learning	Walker2d medium-expert	Normalized Score23	42
Offline Reinforcement Learning	Hopper medium-expert	Normalized Score43.2	35
Offline Reinforcement Learning	Halfcheetah medium-expert	Normalized Score21.9	26
Offline Reinforcement Learning	Hopper expert	Normalized Score48.9	19

Showing 10 of 55 rows

Other info

Follow for update

@wizwand_team Discord