Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Aether: Geometric-Aware Unified World Modeling

About

The integration of geometric reconstruction and generative modeling remains a critical challenge in developing AI systems capable of human-like spatial reasoning. This paper proposes Aether, a unified framework that enables geometry-aware reasoning in world models by jointly optimizing three core capabilities: (1) 4D dynamic reconstruction, (2) action-conditioned video prediction, and (3) goal-conditioned visual planning. Through task-interleaved feature learning, Aether achieves synergistic knowledge sharing across reconstruction, prediction, and planning objectives. Building upon video generation models, our framework demonstrates zero-shot synthetic-to-real generalization despite never observing real-world data during training. Furthermore, our approach achieves zero-shot generalization in both action following and reconstruction tasks, thanks to its intrinsic geometric modeling. Notably, even without real-world data, its reconstruction performance is comparable with or even better than that of domain-specific models. Additionally, Aether employs camera trajectories as geometry-informed action spaces, enabling effective action-conditioned prediction and visual planning. We hope our work inspires the community to explore new frontiers in physically-reasonable world modeling and its applications.

Aether Team, Haoyi Zhu, Yifan Wang, Jianjun Zhou, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Chunhua Shen, Jiangmiao Pang, Tong He• 2025

Related benchmarks

TaskDatasetResultRank
Video Depth EstimationSintel
Relative Error (Rel)0.314
109
Video Depth EstimationBONN
Relative Error (Rel)0.273
103
Camera pose estimationSintel
ATE0.189
92
Camera pose estimationScanNet
ATE RMSE (Avg.)0.176
61
Camera pose estimationTUM dynamics
RRE1.106
57
Video Depth EstimationKITTI
Abs Rel0.054
47
Human-centric depth estimationBONN
Abs Rel0.273
16
Text-to-Video GenerationVBench & UniBench Dataset
Background Consistency95.28
6
Depth EstimationUniBench
Abs Rel0.025
4
Showing 9 of 9 rows

Other info

Follow for update