StyleGaussian: Instant 3D Style Transfer with Gaussian Splatting
About
We introduce StyleGaussian, a novel 3D style transfer technique that allows instant transfer of any image's style to a 3D scene at 10 frames per second (fps). Leveraging 3D Gaussian Splatting (3DGS), StyleGaussian achieves style transfer without compromising its real-time rendering ability and multi-view consistency. It achieves instant style transfer with three steps: embedding, transfer, and decoding. Initially, 2D VGG scene features are embedded into reconstructed 3D Gaussians. Next, the embedded features are transformed according to a reference style image. Finally, the transformed features are decoded into the stylized RGB. StyleGaussian has two novel designs. The first is an efficient feature rendering strategy that first renders low-dimensional features and then maps them into high-dimensional features while embedding VGG features. It cuts the memory consumption significantly and enables 3DGS to render the high-dimensional memory-intensive features. The second is a K-nearest-neighbor-based 3D CNN. Working as the decoder for the stylized features, it eliminates the 2D CNN operations that compromise strict multi-view consistency. Extensive experiments show that StyleGaussian achieves instant 3D stylization with superior stylization quality while preserving real-time rendering and strict multi-view consistency. Project page: https://kunhao-liu.github.io/StyleGaussian/
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Stylization | TnT (M60 scene) | ArtScore8.63 | 15 | |
| 3D Stylization | TnT Truck scene | ArtScore5.76 | 15 | |
| Multi-view consistency | M60 scene Long-range AnyStyle | LPIPS0.091 | 11 | |
| Multi-view consistency | M60 scene AnyStyle (short-range) | LPIPS0.038 | 11 | |
| Multi-view consistency | AnyStyle Scene Long-range (train) | LPIPS0.067 | 11 | |
| Multi-view consistency | Garden scene Long-range AnyStyle | LPIPS0.177 | 11 | |
| Multi-view consistency | Truck scene Short-range AnyStyle | LPIPS0.031 | 11 | |
| Multi-view consistency | Truck scene Long-range AnyStyle | LPIPS0.086 | 11 | |
| Short-range Multi-view Consistency | Tanks and Temples short-range | Average LPIPS0.033 | 11 | |
| Multi-view consistency | Garden scene short-range AnyStyle | LPIPS0.069 | 11 |