OmniStyle2: Learning to Stylize by Learning to Destylize
About
This paper introduces a scalable paradigm for supervised style transfer by inverting the problem: instead of learning to stylize directly, we learn to destylize, reducing stylistic elements from artistic images to recover their natural counterparts and thereby producing authentic, pixel-aligned training pairs at scale. To realize this paradigm, we propose DeStylePipe, a progressive, multi-stage destylization framework that begins with global general destylization, advances to category-wise instruction adaptation, and ultimately deploys specialized model adaptation for complex styles that prompt engineering alone cannot handle. Tightly integrated into this pipeline, DestyleCoT-Filter employs Chain-of-Thought reasoning to assess content preservation and style removal at each stage, routing challenging samples forward while discarding persistently low-quality pairs. Built on this framework, we construct DeStyle-350K, a large-scale dataset aligning diverse artistic styles with their underlying content. We further introduce BCS-Bench, a benchmark featuring balanced content generality and style diversity for systematic evaluation. Extensive experiments demonstrate that models trained on DeStyle-350K achieve superior stylization quality, validating destylization as a reliable and scalable supervision paradigm for style transfer.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Style Transfer | User Study | Rank 1 Score28.19 | 8 | |
| Style Transfer | BCS-Bench | DINO0.6441 | 8 | |
| Image Editing | BCS-Bench | DINO Score64.41 | 6 | |
| Image Editing | User Study 1.0 (test) | Rank 1 Accuracy (%)22.56 | 6 | |
| Stylized Image Quality Assessment | DeStyle-350K | Style Consistency4.54 | 1 | |
| Stylized Image Quality Assessment | OmniStyle150K | -- | 1 |