Dynamics Are Learned, Not Told: Semi-Supervised Discovery of Latent Dynamics Geometries For Zero-Shot Policy Adaptation

About

Real-world dynamics shifts pose a critical challenge for reinforcement learning in robotics, as policies tightly coupled to nominal environments often fail catastrophically when physical conditions change. Most existing methods rely on encoding explicitly identified physical parameters into a latent context, a parameter-centric paradigm that depends on pre-specified axes of variation and becomes brittle under unmodeled or compound dynamics changes. We revisit dynamics adaptation from an outcome-centric perspective: rather than telling policies what the dynamics are, we enable them to learn how dynamics affect interaction outcomes. Theoretically, this is grounded in a monotonic relationship between target-domain regret and the Lipschitz constant of a trajectory dynamics encoder. Practically, this constant can be upper-bounded through contrastive learning, yielding a smooth, task-relevant latent topology without privileged dynamics information. On MuJoCo benchmarks, our method consistently outperforms parameter-centric baselines under severe dynamics shifts, including unmodeled and time-varying parameters, while also improving in-distribution stability and latent interpretability. Overall, these results validate that controlling latent geometry is a principled mechanism for robust adaptation.

Zhiming Xu, Weitao Zhou, Xianghui Pan, Nanshan Deng, Chengju Liu, Qijun Chen, Chenpeng Yao• 2026

Related benchmarks

Task	Dataset	Result
Locomotion	Hopper (In-Distribution)	Mean Cumulative Reward3.24e+3	6
Locomotion	Walker2d (in-distribution)	Mean Cumulative Reward4.88e+3	6
Locomotion	Ant (in-distribution)	Mean Cumulative Reward5.18e+3	6
Locomotion	Hopper Mass Scale 0.5x	Cumulative Reward3.15e+3	6
Locomotion	Hopper Mass Scale 1.0x	Cumulative Reward3.43e+3	6
Locomotion	Hopper Time-Varying Dynamics	Cumulative Reward3.29e+3	6
Locomotion	Hopper Structural Failure	Cumulative Reward596	6
Locomotion	Walker2d Damping Scale 0.3x	Cumulative Reward5.34e+3	6
Locomotion	Walker2d Damping Scale 1.0x	Cumulative Reward5.32e+3	6
Locomotion	Walker2d Damping Scale 2.2x	Cumulative Reward5.35e+3	6

Showing 10 of 24 rows

Other info

Follow for update

@wizwand_team Discord