DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing
About
We consider the problem of editing 3D objects and scenes based on open-ended language instructions. A common approach to this problem is to use a 2D image generator or editor to guide the 3D editing process, obviating the need for 3D data. However, this process is often inefficient due to the need for iterative updates of costly 3D representations, such as neural radiance fields, either through individual view edits or score distillation sampling. A major disadvantage of this approach is the slow convergence caused by aggregating inconsistent information across views, as the guidance from 2D models is not multi-view consistent. We thus introduce the Direct Gaussian Editor (DGE), a method that addresses these issues in two stages. First, we modify a given high-quality image editor like InstructPix2Pix to be multi-view consistent. To do so, we propose a training-free approach that integrates cues from the 3D geometry of the underlying scene. Second, given a multi-view consistent edited sequence of images, we directly and efficiently optimize the 3D representation, which is based on 3D Gaussian Splatting. Because it avoids incremental and iterative edits, DGE is significantly more accurate and efficient than existing approaches and offers additional benefits, such as enabling selective editing of parts of the scene.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-view Consistent Editing | Multi-view Consistent Editing dataset (test) | MEt3R0.224 | 7 | |
| 3D Scene Editing | 3D Gaussian Splat Editing (evaluation set) | CLIPdir0.146 | 6 | |
| Geometry Addition | EDIT3D-BENCH (Unseen Assets) | LPIPS0.227 | 5 | |
| Geometry Removal | EDIT3D-BENCH (Unseen Assets) | LPIPS0.168 | 5 | |
| 3D Scene Editing | 3D Scene Editing Evaluation Set (full) | Mean CLIP Similarity0.259 | 5 | |
| Geometry Addition | EDIT3D-BENCH Seen Assets, Unseen Edits | LPIPS0.229 | 5 | |
| Geometry Removal | EDIT3D-BENCH Seen Assets, Unseen Edits | LPIPS0.219 | 5 | |
| Novel-view stylization | 53 stylizations (Instruct-NeRF2NeRF, GaussCtrl, ScanNet++, Mip-NeRF360, and new scenes) (full evaluation set) | CLIP Direction Similarity0.113 | 5 | |
| Texture Editing | EDIT3D-BENCH (Unseen Assets) | LPIPS0.233 | 5 | |
| Texture Editing | EDIT3D-BENCH Seen Assets, Unseen Edits | LPIPS0.265 | 5 |