CoNeRF: Controllable Neural Radiance Fields
About
We extend neural 3D representations to allow for intuitive and interpretable user control beyond novel view rendering (i.e. camera control). We allow the user to annotate which part of the scene one wishes to control with just a small number of mask annotations in the training images. Our key idea is to treat the attributes as latent variables that are regressed by the neural network given the scene encoding. This leads to a few-shot learning framework, where attributes are discovered automatically by the framework, when annotations are not provided. We apply our method to various scenes with different types of controllable attributes (e.g. expression control on human faces, or state control in movement of inanimate objects). Overall, we demonstrate, to the best of our knowledge, for the first time novel view and novel attribute re-rendering of scenes from a single video.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Novel View Synthesis | Real Data (Interpolation) | PSNR32.342 | 9 | |
| Novel view and novel attribute synthesis | Synthetic Data | PSNR32.394 | 3 | |
| Novel View Synthesis | Synthetic data (test) | PSNR40.4 | 3 | |
| Controllable Scene Synthesis | Real Controllable Scenes Eyes/Mouth | PSNR21.4658 | 2 | |
| Controllable Scene Synthesis | Real Controllable Scenes Transformer | PSNR23.0319 | 2 | |
| Scene editing | Synthetic data (test) | PSNR39.79 | 2 |