HoRD: Robust Humanoid Control via History-Conditioned Reinforcement Learning and Online Distillation

About

Humanoid robots can suffer significant performance drops under small changes in dynamics, task specifications, or environment setup. We propose HoRD, a two-stage learning framework for robust humanoid control under domain shift. First, we train a high-performance teacher policy via history-conditioned reinforcement learning, where the policy infers latent dynamics context from recent state--action trajectories to adapt online to diverse randomized dynamics. Second, we perform online distillation to transfer the teacher's robust control capabilities into a transformer-based student policy that operates on sparse root-relative 3D joint keypoint trajectories. By combining history-conditioned adaptation with online distillation, HoRD enables a single policy to adapt zero-shot to unseen domains without per-domain retraining. Extensive experiments show HoRD outperforms strong baselines in robustness and transfer, especially under unseen domains and external perturbations. Code and project page are available at https://tonywang-0517.github.io/hord/.

Puyue Wang, Jiawei Hu, Yan Gao, Junyan Wang, Yu Zhang, Gillian Dobbie, Tao Gu, Wafa Johal, Ting Dang, Hong Jia• 2026

Related benchmarks

Task	Dataset	Result
Humanoid Control	AMASS IsaacLab ID (test)	Success Rate90.7	5
Humanoid Control	AMASS IsaacLab + DR ID (test)	MPJPE (Egocentric, mm)124	5
Humanoid Control	AMASS Genesis OOD (test)	Ego MPJPE (mm)162	5
Humanoid Control	AMASS Genesis + DR OOD (test)	Eg-MPJPE (mm)171	5

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord