NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
About
In this paper, we propose NeoVerse, a versatile 4D world model that is capable of 4D reconstruction, novel-trajectory video generation, and rich downstream applications. We first identify a common limitation of scalability in current 4D world modeling methods, caused either by expensive and specialized multi-view 4D data or by cumbersome training pre-processing. In contrast, our NeoVerse is built upon a core philosophy that makes the full pipeline scalable to diverse in-the-wild monocular videos. Specifically, NeoVerse features pose-free feed-forward 4D reconstruction, online monocular degradation pattern simulation, and other well-aligned techniques. These designs empower NeoVerse with versatility and generalization to various domains. Meanwhile, NeoVerse achieves state-of-the-art performance in standard reconstruction and generation benchmarks. Our project page is available at https://neoverse-4d.github.io.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Reconstruction | DAVIS | PSNR25.26 | 29 | |
| Novel View Synthesis | NVIDIA | PSNR15.86 | 20 | |
| Novel View Synthesis | ADT | PSNR21.94 | 10 | |
| Novel View Synthesis | TUM-D | PSNR15.26 | 10 | |
| Novel View Synthesis | ExoRecon (held-out frames) | PSNR (Held-out Frames)20.03 | 9 | |
| Dynamic Reconstruction | DyCheck | PSNR11.56 | 8 | |
| Dynamic Reconstruction | ADT | PSNR32.56 | 7 | |
| 4D Camera Control | PREBench Camera-only | Camera Rotation Error1.4736 | 7 | |
| Novel View Generation | VBench 100 unseen in-the-wild videos 30 | Inference Time (Generation)18 | 6 | |
| View Synthesis | N3DV Original Input Cameras | PSNR24.5 | 6 |