Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts
About
Cross-scene generalizable NeRF models, which can directly synthesize novel views of unseen scenes, have become a new spotlight of the NeRF field. Several existing attempts rely on increasingly end-to-end "neuralized" architectures, i.e., replacing scene representation and/or rendering modules with performant neural networks such as transformers, and turning novel view synthesis into a feed-forward inference pipeline. While those feedforward "neuralized" architectures still do not fit diverse scenes well out of the box, we propose to bridge them with the powerful Mixture-of-Experts (MoE) idea from large language models (LLMs), which has demonstrated superior generalization ability by balancing between larger overall model capacity and flexible per-instance specialization. Starting from a recent generalizable NeRF architecture called GNT, we first demonstrate that MoE can be neatly plugged in to enhance the model. We further customize a shared permanent expert and a geometry-aware consistency loss to enforce cross-scene consistency and spatial smoothness respectively, which are essential for generalizable view synthesis. Our proposed model, dubbed GNT with Mixture-of-View-Experts (GNT-MOVE), has experimentally shown state-of-the-art results when transferring to unseen scenes, indicating remarkably better cross-scene generalization in both zero-shot and few-shot settings. Our codes are available at https://github.com/VITA-Group/GNT-MOVE.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Novel View Synthesis | LLFF | PSNR26.02 | 124 | |
| Novel View Synthesis | NeRF Synthetic | PSNR27.47 | 92 | |
| Novel View Synthesis | Tanks&Temples | SSIM64 | 39 | |
| Novel View Synthesis | LLFF 3-shot | PSNR19.58 | 17 | |
| View Synthesis | Tanks&Temples | PSNR20.1 | 15 | |
| View Synthesis | Shiny-6 (test) | PSNR27.54 | 11 | |
| Novel View Synthesis | NMR | PSNR32.12 | 5 | |
| Novel View Synthesis | LLFF 6-shot | PSNR22.36 | 5 | |
| Novel View Synthesis | NeRF Synthetic 6-shot | PSNR22.39 | 5 | |
| Novel View Synthesis | NeRF Synthetic 12-shot | PSNR25.25 | 5 |