AnyStyle: Single-Pass Multimodal Stylization for 3D Gaussian Splatting
About
The growing demand for rapid and scalable 3D asset creation has driven interest in feed-forward 3D reconstruction methods, with 3D Gaussian Splatting (3DGS) emerging as an effective scene representation. While recent approaches have demonstrated pose-free reconstruction from unposed image collections, integrating stylization or appearance control into such pipelines remains underexplored. Existing attempts largely rely on image-based conditioning, which limits both controllability and flexibility. In this work, we introduce AnyStyle, a feed-forward 3D reconstruction and stylization framework that enables pose-free, zero-shot stylization through multimodal conditioning. Our method supports both textual and visual style inputs, allowing users to control the scene appearance using natural language descriptions or reference images. We propose a modular stylization architecture that requires only minimal architectural modifications and can be integrated into existing feed-forward 3D reconstruction backbones. Experiments demonstrate that AnyStyle improves style controllability over prior feed-forward stylization methods while preserving high-quality geometric reconstruction. A user study further confirms that AnyStyle achieves superior stylization quality compared to an existing state-of-the-art approach. Repository: https://github.com/joaxkal/AnyStyle.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Stylization | TnT Truck scene | ArtScore10.56 | 15 | |
| 3D Stylization | TnT (M60 scene) | ArtScore10.2 | 15 | |
| Multi-view consistency | Garden scene short-range AnyStyle | LPIPS0.055 | 11 | |
| Short-range Multi-view Consistency | Tanks and Temples short-range | Average LPIPS0.029 | 11 | |
| Multi-view consistency | Truck scene Short-range AnyStyle | LPIPS0.028 | 11 | |
| Multi-view consistency | M60 scene AnyStyle (short-range) | LPIPS0.036 | 11 | |
| Multi-view consistency | AnyStyle Scene Long-range (train) | LPIPS0.062 | 11 | |
| Multi-view consistency | Truck scene Long-range AnyStyle | LPIPS0.079 | 11 | |
| Multi-view consistency | Garden scene Long-range AnyStyle | LPIPS0.169 | 11 | |
| Multi-view consistency | M60 scene Long-range AnyStyle | LPIPS0.098 | 11 |