Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DriveFix: Spatio-Temporally Coherent Driving Scene Restoration

About

Recent advancements in 4D scene reconstruction, particularly those leveraging diffusion priors, have shown promise for novel view synthesis in autonomous driving. However, these methods often process frames independently or in a view-by-view manner, leading to a critical lack of spatio-temporal synergy. This results in spatial misalignment across cameras and temporal drift in sequences. We propose DriveFix, a novel multi-view restoration framework that ensures spatio-temporal coherence for driving scenes. Our approach employs an interleaved diffusion transformer architecture with specialized blocks to explicitly model both temporal dependencies and cross-camera spatial consistency. By conditioning the generation on historical context and integrating geometry-aware training losses, DriveFix enforces that the restored views adhere to a unified 3D geometry. This enables the consistent propagation of high-fidelity textures and significantly reduces artifacts. Extensive evaluations on the Waymo, nuScenes, and PandaSet datasets demonstrate that DriveFix achieves state-of-the-art performance in both reconstruction and novel view synthesis, marking a substantial step toward robust 4D world modeling for real-world deployment.

Heyu Si, Brandon James Denis, Muyang Sun, Dragos Datcu, Yaoru Li, Xin Jin, Ruiju Fu, Yuliia Tatarinova, Federico Landi, Jie Song, Mingli Song, Qi Guo• 2026

Related benchmarks

TaskDatasetResultRank
Spatio-temporal Driving Scene InterpolationWaymo Open Dataset
PSNR31.31
12
Spatio-temporal Driving Scene ReconstructionWaymo Open Dataset
PSNR34.43
12
Generative View SynthesisPandaSet
FID (2m)57.1
8
InterpolationnuScenes
PSNR30.67
8
View Reconstruction and InterpolationPandaSet
PSNR28.28
8
Novel View SynthesisWaymo Open Dataset
FID (3m)74.33
8
Showing 6 of 6 rows

Other info

Follow for update