DreamSAC: Learning Hamiltonian World Models via Symmetry Exploration
About
Learned world models excel at interpolative generalization but fail at extrapolative generalization to novel physical properties. This limitation arises because they learn statistical correlations rather than the environment's underlying generative rules, such as physical invariances and conservation laws. We argue that learning these invariances is key to robust extrapolation. To achieve this, we first introduce \textbf{Symmetry Exploration}, an unsupervised exploration strategy where an agent is intrinsically motivated by a Hamiltonian-based curiosity bonus to actively probe and challenge its understanding of conservation laws, thereby collecting physically informative data. Second, we design a Hamiltonian-based world model that learns from the collected data, using a novel self-supervised contrastive objective to identify the invariant physical state from raw, view-dependent pixel observations. Our framework, \textbf{DreamSAC}, trained on this actively curated data, significantly outperforms state-of-the-art baselines in 3D physics simulations on tasks requiring extrapolation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| World model image prediction | DeepMind Control Suite Humanoid | MSE4.7776 | 12 | |
| Walker Walk | DeepMind Control Suite (in-distribution) | Average Return996.5 | 10 | |
| World model image prediction | DeepMind Control Suite Cheetah | MSE0.1565 | 10 | |
| World model image prediction | DeepMind Control Suite Acrobot | MSE0.1806 | 10 | |
| World model image prediction | GymFetch FetchPush | MSE0.302 | 10 | |
| World model image prediction | GymFetch FetchReach | MSE0.313 | 10 | |
| Reinforcement Learning | Hopper stand | Average Episodic Return967.9 | 9 | |
| World model image prediction | DeepMind Control Suite Hopper | MSE0.3149 | 9 | |
| World model image prediction | DeepMind Control Suite Walker | MSE1.0044 | 9 | |
| Extrapolative Generalization | Reacher-hard Unseen View | Mean Reward321.9 | 5 |