Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision

About

We present EmerNeRF, a simple yet powerful approach for learning spatial-temporal representations of dynamic driving scenes. Grounded in neural fields, EmerNeRF simultaneously captures scene geometry, appearance, motion, and semantics via self-bootstrapping. EmerNeRF hinges upon two core components: First, it stratifies scenes into static and dynamic fields. This decomposition emerges purely from self-supervision, enabling our model to learn from general, in-the-wild data sources. Second, EmerNeRF parameterizes an induced flow field from the dynamic field and uses this flow field to further aggregate multi-frame features, amplifying the rendering precision of dynamic objects. Coupling these three fields (static, dynamic, and flow) enables EmerNeRF to represent highly-dynamic scenes self-sufficiently, without relying on ground truth object annotations or pre-trained models for dynamic object segmentation or optical flow estimation. Our method achieves state-of-the-art performance in sensor simulation, significantly outperforming previous methods when reconstructing static (+2.93 PSNR) and dynamic (+3.70 PSNR) scenes. In addition, to bolster EmerNeRF's semantic generalization, we lift 2D visual foundation model features into 4D space-time and address a general positional bias in modern Transformers, significantly boosting 3D perception performance (e.g., 37.50% relative improvement in occupancy prediction accuracy on average). Finally, we construct a diverse and challenging 120-sequence dataset to benchmark neural fields under extreme and highly-dynamic settings.

Jiawei Yang, Boris Ivanovic, Or Litany, Xinshuo Weng, Seung Wook Kim, Boyi Li, Tong Che, Danfei Xu, Sanja Fidler, Marco Pavone, Yue Wang• 2023

Related benchmarks

TaskDatasetResultRank
Scene ReconstructionnuScenes
PSNR26.75
17
Novel View SynthesisnuScenes Shift ± 2 v1.0-trainval (test)
FID52.03
14
Surrounding View SynthesisNuScenes v1.0 (test)
PSNR26.75
11
RGB ReconstructionnuScenes (val)
PSNR30.88
10
Out-of-path View SynthesisCARLA (out-of-path)
PSNR21.18
8
Depth EstimationnuScenes Sparse LiDAR GT official (val)
Abs Rel Error0.073
7
RGB Novel-View SynthesisnuScenes (val)
PSNR20.91
7
Novel View SynthesisWaymo Open Dataset 12 scenes
PSNR26.12
7
Scene ReconstructionWaymo Open Dataset 12 scenes
PSNR27.15
7
View SynthesisWaymo Static scenes (test)
PSNR30.15
7
Showing 10 of 25 rows

Other info

Follow for update