Aether: Geometric-Aware Unified World Modeling

About

The integration of geometric reconstruction and generative modeling remains a critical challenge in developing AI systems capable of human-like spatial reasoning. This paper proposes Aether, a unified framework that enables geometry-aware reasoning in world models by jointly optimizing three core capabilities: (1) 4D dynamic reconstruction, (2) action-conditioned video prediction, and (3) goal-conditioned visual planning. Through task-interleaved feature learning, Aether achieves synergistic knowledge sharing across reconstruction, prediction, and planning objectives. Building upon video generation models, our framework demonstrates zero-shot synthetic-to-real generalization despite never observing real-world data during training. Furthermore, our approach achieves zero-shot generalization in both action following and reconstruction tasks, thanks to its intrinsic geometric modeling. Notably, even without real-world data, its reconstruction performance is comparable with or even better than that of domain-specific models. Additionally, Aether employs camera trajectories as geometry-informed action spaces, enabling effective action-conditioned prediction and visual planning. We hope our work inspires the community to explore new frontiers in physically-reasonable world modeling and its applications.

Aether Team, Haoyi Zhu, Yifan Wang, Jianjun Zhou, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Chunhua Shen, Jiangmiao Pang, Tong He• 2025

Related benchmarks

Task	Dataset	Result
Video Depth Estimation	Sintel	Delta Threshold Accuracy (1.25)60.4	235
Camera pose estimation	TUM-dynamic	ATE0.092	205
Camera pose estimation	Sintel	ATE0.189	203
Video Depth Estimation	KITTI	Abs Rel0.054	153
Video Depth Estimation	BONN	AbsRel27.3	139
Camera pose estimation	ScanNet	RPE (t)0.028	133
Video Depth Estimation	BONN	Relative Error (Rel)0.273	108
Camera pose estimation	TUM dynamics	ATE0.092	90
Depth Estimation	KITTI 110 frames	AbsRel5.6	75
Depth Estimation	Sintel ~50 frames	AbsRel0.324	70

Showing 10 of 52 rows

Other info

Follow for update

@wizwand_team Discord