Cross-Domain Policy Optimization via Bellman Consistency and Hybrid Critics

About

Cross-domain reinforcement learning (CDRL) is meant to improve the data efficiency of RL by leveraging the data samples collected from a source domain to facilitate the learning in a similar target domain. Despite its potential, cross-domain transfer in RL is known to have two fundamental and intertwined challenges: (i) The source and target domains can have distinct state space or action space, and this makes direct transfer infeasible and thereby requires more sophisticated inter-domain mappings; (ii) The transferability of a source-domain model in RL is not easily identifiable a priori, and hence CDRL can be prone to negative effect during transfer. In this paper, we propose to jointly tackle these two challenges through the lens of \textit{cross-domain Bellman consistency} and \textit{hybrid critic}. Specifically, we first introduce the notion of cross-domain Bellman consistency as a way to measure transferability of a source-domain model. Then, we propose $Q$Avatar, which combines the Q functions from both the source and target domains with an adaptive hyperparameter-free weight function. Through this design, we characterize the convergence behavior of $Q$Avatar and show that $Q$Avatar achieves reliable transfer in the sense that it effectively leverages a source-domain Q function for knowledge transfer to the target domain. Through experiments, we demonstrate that $Q$Avatar achieves favorable transferability across various RL benchmark tasks, including locomotion and robot arm manipulation. Our code is available at https://rl-bandits-lab.github.io/Cross-Domain-RL/.

Ming-Hong Chen, Kuan-Chen Pan, You-De Huang, Xi Liu, Ping-Chun Hsieh• 2026

Related benchmarks

Task	Dataset	Result
Continuous Control	MuJoCo Ant	Average Reward2.86e+3	26
Continuous Control	MuJoCo HalfCheetah	Average Reward1.16e+4	25
Navigation	Navigation	--	24
Continuous Control	Robosuite Door Opening	Final Reward216.6	7
Continuous Control	Robosuite Table Wiping	Final Reward76.6	7
Continuous Control	Safety-Gym Navigation	Final Reward38.5	7
Continuous Control	Halfcheetah	Steps to Threshold1.26e+5	2
Robotic Manipulation	Door Opening	Environment Steps48	2
Robotic Manipulation	Table Wiping	Environment Steps to Threshold7.20e+4	2

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord