Dynamics-Aligned Shared Hypernetworks for Contextual RL under Discontinuous Shifts

About

Zero-shot generalization in contextual reinforcement learning remains a core challenge, particularly when the context is latent and must be inferred from data. A canonical failure mode arises when latent context discontinuously changes how actions affect the environment, requiring incompatible control responses across contexts. We propose DMA*-SH, a framework where a single hypernetwork, trained solely via dynamics prediction, generates a small set of adapter weights shared across the dynamics model, policy, and action-value function. This shared modulation imparts an inductive bias matched to discontinuous context-to-dynamics shifts, while input/output normalization and random input masking stabilize context inference, promoting directionally concentrated representations. We provide theoretical support via expressivity separation results for hypernetwork modulation, and a variance decomposition with policy-gradient variance bounds that formalize how within-mode compression improves learning under non-overlapping contexts. For evaluation, we introduce the Actuator Inversion Benchmark (AIB), a suite of environments designed to isolate challenging context-to-dynamics interactions, including actuator inversion, actuator permutations, and weakly non-overlapping continuous dynamics. On AIB's held-out tasks, DMA*-SH achieves zero-shot generalization, outperforming domain randomization by 58.1% and surpassing a standard context-aware baseline by 11.5% on average.

Jan Benad, Pradeep Kr. Banerjee, Frank R\"oder, Nihat Ay, Martin V. Butz, Manfred Eppe• 2026

Related benchmarks

Task	Dataset	Result
Actuator Inversion	BallInCup (eval-in)	AER955	8
Actuator Inversion	DI-Friction C (train)	AER71	8
Actuator Inversion	DI-Friction (Ceval-in)	AER0.71	8
Zero-Shot Actuator Inversion	AIB Cheetah environment Ceval-out	AER225	8
Actuator Inversion	Walker C (train)	AER885	8
Actuator Inversion	WalkerGym C (train)	AER3.33e+3	8
Actuator Inversion	Walker (Ceval-in)	AER888	8
Actuator Inversion	WalkerGym (Ceval-in)	AER3.38e+3	8
Actuator Inversion	HopperGym (Ceval-in)	AER2.85e+3	8
Zero-Shot Actuator Inversion	AIB DI-Friction environment Ceval-out	AER62	8

Showing 10 of 42 rows

Other info

Follow for update

@wizwand_team Discord