Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions
About
We propose a method for editing NeRF scenes with text-instructions. Given a NeRF of a scene and the collection of images used to reconstruct it, our method uses an image-conditioned diffusion model (InstructPix2Pix) to iteratively edit the input images while optimizing the underlying scene, resulting in an optimized 3D scene that respects the edit instruction. We demonstrate that our proposed method is able to edit large-scale, real-world scenes, and is able to accomplish more realistic, targeted edits than prior work.
Ayaan Haque, Matthew Tancik, Alexei A. Efros, Aleksander Holynski, Angjoo Kanazawa• 2023
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| NeRF Colorization | LLFF | CF45.599 | 8 | |
| Portrait Editing | Tensor4D static scenes | CLIP Similarity0.2989 | 7 | |
| Super-Resolution | LLFF | PSNR20.299 | 6 | |
| Local 3D Editing | Evaluation dataset unseen 3D assets (test) | CLIP Similarity0.253 | 6 | |
| Global 3D Editing | Evaluation dataset unseen 3D assets (test) | CLIP Similarity0.239 | 6 | |
| Text-driven NeRF Editing | Face, Fangzhou, and Farm (test) | CLIP Dir Sim0.2021 | 5 | |
| Novel-view stylization | 53 stylizations (Instruct-NeRF2NeRF, GaussCtrl, ScanNet++, Mip-NeRF360, and new scenes) (full evaluation set) | CLIP Direction Similarity0.098 | 5 | |
| Object Insertion | 35 unique edits (5 scenes x 7 objects) (test) | CLIPScore0.2347 | 5 | |
| Stylization Semantic Alignment | Rodin 35 examples | CLIP-IQA23.93 | 5 | |
| 3D Object Editing | Synthetic 3D Fashion Objects (test) | CLIP-Dir-SimViT (B/32)0.0583 | 4 |
Showing 10 of 15 rows