Dual-Robust Cross-Domain Offline Reinforcement Learning Against Dynamics Shifts

About

Single-domain offline reinforcement learning (RL) often suffers from limited data coverage, while cross-domain offline RL handles this issue by leveraging additional data from other domains with dynamics shifts. However, existing studies primarily focus on train-time robustness (handling dynamics shifts from training data), neglecting the test-time robustness against dynamics perturbations when deployed in practical scenarios. In this paper, we investigate dual (both train-time and test-time) robustness against dynamics shifts in cross-domain offline RL. We first empirically show that the policy trained with cross-domain offline RL exhibits fragility under dynamics perturbations during evaluation, particularly when target domain data is limited. To address this, we introduce a novel robust cross-domain Bellman (RCB) operator, which enhances test-time robustness against dynamics perturbations while staying conservative to the out-of-distribution dynamics transitions, thus guaranteeing the train-time robustness. To further counteract potential value overestimation or underestimation caused by the RCB operator, we introduce two techniques, the dynamic value penalty and the Huber loss, into our framework, resulting in the practical \textbf{D}ual-\textbf{RO}bust \textbf{C}ross-domain \textbf{O}ffline RL (DROCO) algorithm. Extensive empirical results across various dynamics shift scenarios show that DROCO outperforms strong baselines and exhibits enhanced robustness to dynamics perturbations.

Zhongjian Qiao, Rui Yang, Jiafei Lyu, Xiu Li, Zhongxiang Dai, Zhuoran Yang, Siyang Gao, Shuang Qiu• 2025

Related benchmarks

Task	Dataset	Result
Offline Reinforcement Learning	D4RL halfcheetah-medium-expert	Normalized Score70.1	169
Offline Reinforcement Learning	D4RL hopper-medium-expert	Normalized Score82.3	161
Offline Reinforcement Learning	D4RL Medium-Replay Hopper	Normalized Score51.6	109
Offline Reinforcement Learning	D4RL Medium HalfCheetah	Normalized Score45.8	105
Offline Reinforcement Learning	D4RL Medium Walker2d	Normalized Score60.1	104
Offline Reinforcement Learning	D4RL Medium-Replay HalfCheetah	Normalized Score27.9	97
Offline Reinforcement Learning	D4RL Walker2d Medium v2	Normalized Return70.8	85
Offline Reinforcement Learning	D4RL halfcheetah v2 (medium-replay)	Normalized Score26.9	68
Offline Reinforcement Learning	D4RL walker2d-expert v2	Normalized Score106	66
Offline Reinforcement Learning	D4RL hopper-expert v2	Normalized Score89.3	66

Showing 10 of 76 rows

...

Other info

Follow for update

@wizwand_team Discord