Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Styl3R: Instant 3D Stylized Reconstruction for Arbitrary Scenes and Styles

About

Stylizing 3D scenes instantly while maintaining multi-view consistency and faithfully resembling a style image remains a significant challenge. Current state-of-the-art 3D stylization methods typically involve computationally intensive test-time optimization to transfer artistic features into a pretrained 3D representation, often requiring dense posed input images. In contrast, leveraging recent advances in feed-forward reconstruction models, we demonstrate a novel approach to achieve direct 3D stylization in less than a second using unposed sparse-view scene images and an arbitrary style image. To address the inherent decoupling between reconstruction and stylization, we introduce a branched architecture that separates structure modeling and appearance shading, effectively preventing stylistic transfer from distorting the underlying 3D scene structure. Furthermore, we adapt an identity loss to facilitate pre-training our stylization model through the novel view synthesis task. This strategy also allows our model to retain its original reconstruction capabilities while being fine-tuned for stylization. Comprehensive evaluations, using both in-domain and out-of-domain datasets, demonstrate that our approach produces high-quality stylized 3D content that achieve a superior blend of style and scene appearance, while also outperforming existing methods in terms of multi-view consistency and efficiency.

Peng Wang, Xiang Liu, Peidong Liu• 2025

Related benchmarks

TaskDatasetResultRank
3D StylizationTnT Truck scene
ArtScore7.07
15
3D StylizationTnT (M60 scene)
ArtScore8.43
15
Multi-view consistencyGarden scene Long-range AnyStyle
LPIPS0.185
11
Multi-view consistencyTruck scene Short-range AnyStyle
LPIPS0.049
11
Multi-view consistencyGarden scene short-range AnyStyle
LPIPS0.085
11
Multi-view consistencyAnyStyle Scene Long-range (train)
LPIPS0.109
11
Short-range Multi-view ConsistencyTanks and Temples short-range
Average LPIPS0.056
11
Multi-view consistencyM60 scene AnyStyle (short-range)
LPIPS0.064
11
Multi-view consistencyTruck scene Long-range AnyStyle
LPIPS0.136
11
Multi-view consistencyM60 scene Long-range AnyStyle
LPIPS0.16
11
Showing 10 of 19 rows

Other info

Follow for update