Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Dual-Robust Cross-Domain Offline Reinforcement Learning Against Dynamics Shifts

About

Single-domain offline reinforcement learning (RL) often suffers from limited data coverage, while cross-domain offline RL handles this issue by leveraging additional data from other domains with dynamics shifts. However, existing studies primarily focus on train-time robustness (handling dynamics shifts from training data), neglecting the test-time robustness against dynamics perturbations when deployed in practical scenarios. In this paper, we investigate dual (both train-time and test-time) robustness against dynamics shifts in cross-domain offline RL. We first empirically show that the policy trained with cross-domain offline RL exhibits fragility under dynamics perturbations during evaluation, particularly when target domain data is limited. To address this, we introduce a novel robust cross-domain Bellman (RCB) operator, which enhances test-time robustness against dynamics perturbations while staying conservative to the out-of-distribution dynamics transitions, thus guaranteeing the train-time robustness. To further counteract potential value overestimation or underestimation caused by the RCB operator, we introduce two techniques, the dynamic value penalty and the Huber loss, into our framework, resulting in the practical \textbf{D}ual-\textbf{RO}bust \textbf{C}ross-domain \textbf{O}ffline RL (DROCO) algorithm. Extensive empirical results across various dynamics shift scenarios show that DROCO outperforms strong baselines and exhibits enhanced robustness to dynamics perturbations.

Zhongjian Qiao, Rui Yang, Jiafei Lyu, Xiu Li, Zhongxiang Dai, Zhuoran Yang, Siyang Gao, Shuang Qiu• 2025

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningD4RL halfcheetah-medium-expert
Normalized Score70.1
117
Offline Reinforcement LearningD4RL hopper-medium-expert
Normalized Score82.3
115
Offline Reinforcement LearningD4RL Medium-Replay Hopper
Normalized Score51.6
72
Offline Reinforcement LearningD4RL Walker2d Medium v2
Normalized Return70.8
67
Offline Reinforcement LearningD4RL Medium HalfCheetah
Normalized Score45.8
59
Offline Reinforcement LearningD4RL Medium-Replay HalfCheetah
Normalized Score27.9
59
Offline Reinforcement LearningD4RL Medium Walker2d
Normalized Score60.1
58
Offline Reinforcement LearningD4RL halfcheetah v2 (medium-replay)
Normalized Score26.9
58
Offline Reinforcement LearningD4RL walker2d-expert v2
Normalized Score106
56
Offline Reinforcement LearningD4RL hopper-expert v2
Normalized Score89.3
56
Showing 10 of 53 rows

Other info

Follow for update