Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Stylizing ViT: Anatomy-Preserving Instance Style Transfer for Domain Generalization

About

Deep learning models in medical image analysis often struggle with generalizability across domains and demographic groups due to data heterogeneity and scarcity. Traditional augmentation improves robustness, but fails under substantial domain shifts. Recent advances in stylistic augmentation enhance domain generalization by varying image styles but fall short in terms of style diversity or by introducing artifacts into the generated images. To address these limitations, we propose Stylizing ViT, a novel Vision Transformer encoder that utilizes weight-shared attention blocks for both self- and cross-attention. This design allows the same attention block to maintain anatomical consistency through self-attention while performing style transfer via cross-attention. We assess the effectiveness of our method for domain generalization by employing it for data augmentation on three distinct image classification tasks in the context of histopathology and dermatology. Results demonstrate an improved robustness (up to +13% accuracy) over the state of the art while generating perceptually convincing images without artifacts. Additionally, we show that Stylizing ViT is effective beyond training, achieving a 17% performance improvement during inference when used for test-time augmentation. The source code is available at https://github.com/sdoerrich97/stylizing-vit .

Sebastian Doerrich, Francesco Di Salvo, Jonas Alle, Christian Ledig• 2026

Related benchmarks

TaskDatasetResultRank
Disease ClassificationCamelyon17-WILDS 1.0 (test)
Test Accuracy0.9565
7
Disease ClassificationEpithelium-Stroma 1.0 (test)
Test Accuracy89.07
7
Disease ClassificationFitzpatrick17k 1.0 (test)
Test Accuracy80.92
7
Style TransferCamelyon WILDS 17 (train)
FID6.2
6
Style TransferEpithelium-Stroma (train)
FID1.5
6
Style TransferFitzpatrick17k (train)
FID36.9
6
ReconstructionCamelyon17-WILDS (train)
PSNR45.4
6
ReconstructionEpithelium-Stroma (train)
PSNR39
6
ReconstructionFitzpatrick17k (train)
PSNR31.5
6
Showing 9 of 9 rows

Other info

Follow for update