Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GeoWorld: Geometric World Models

About

Energy-based predictive world models provide a powerful approach for multi-step visual planning by reasoning over latent energy landscapes rather than generating pixels. However, existing approaches face two major challenges: (i) their latent representations are typically learned in Euclidean space, neglecting the underlying geometric and hierarchical structure among states, and (ii) they struggle with long-horizon prediction, which leads to rapid degradation across extended rollouts. To address these challenges, we introduce GeoWorld, a geometric world model that preserves geometric structure and hierarchical relations through a Hyperbolic JEPA, which maps latent representations from Euclidean space onto hyperbolic manifolds. We further introduce Geometric Reinforcement Learning for energy-based optimization, enabling stable multi-step planning in hyperbolic latent space. Extensive experiments on CrossTask and COIN demonstrate around 3% SR improvement in 3-step planning and 2% SR improvement in 4-step planning compared to the state-of-the-art V-JEPA 2. Project website: https://steve-zeyu-zhang.github.io/GeoWorld.

Zeyu Zhang, Danning Li, Ian Reid, Richard Hartley• 2026

Related benchmarks

TaskDatasetResultRank
Goal-conditioned visual planningCrossTask T=4 88 (test)
SR37.04
40
Goal-conditioned visual planningCrossTask T=3 88
Success Rate (SR)47.47
27
Goal-conditioned visual planningCOIN T=3 71
Success Rate (SR)34.85
20
Goal-conditioned visual planningCOIN T=4 71
SR27.79
20
Goal-conditioned visual planningCrossTask T=3 88 (test)
Success Rate (SR)51.71
13
Goal-conditioned visual planningCOIN T=3 71 (test)
SR45.29
13
Goal-conditioned visual planningCOIN T=4 71 (test)
Success Rate (SR)33.29
13
Showing 7 of 7 rows

Other info

Follow for update