Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Charm: The Missing Piece in ViT fine-tuning for Image Aesthetic Assessment

About

The capacity of Vision transformers (ViTs) to handle variable-sized inputs is often constrained by computational complexity and batch processing limitations. Consequently, ViTs are typically trained on small, fixed-size images obtained through downscaling or cropping. While reducing computational burden, these methods result in significant information loss, negatively affecting tasks like image aesthetic assessment. We introduce Charm, a novel tokenization approach that preserves Composition, High-resolution, Aspect Ratio, and Multi-scale information simultaneously. Charm prioritizes high-resolution details in specific regions while downscaling others, enabling shorter fixed-size input sequences for ViTs while incorporating essential information. Charm is designed to be compatible with pre-trained ViTs and their learned positional embeddings. By providing multiscale input and introducing variety to input tokens, Charm improves ViT performance and generalizability for image aesthetic assessment. We avoid cropping or changing the aspect ratio to further preserve information. Extensive experiments demonstrate significant performance improvements on various image aesthetic and quality assessment datasets (up to 8.1 %) using a lightweight ViT backbone. Code and pre-trained models are available at https://github.com/FBehrad/Charm.

Fatemeh Behrad, Tinne Tuytelaars, Johan Wagemans• 2025

Related benchmarks

TaskDatasetResultRank
Visual Rating (Image Aesthetic Assessment)TAD66K
SRCC0.411
40
Artistic Image Aesthetics AssessmentBAID
SROCC0.368
20
Image Aesthetic AssessmentAADB
SRCC0.754
15
Fine-Grained Aesthetic Assessment (Pair-level)FGAesthetics Natural
Accuracy72.3
15
Fine-Grained Aesthetic Assessment (Pair-level)FGAesthetics Cropping
Accuracy75.5
15
Fine-Grained Aesthetic Assessment (Series-level)FGAesthetics Natural
s-Acc67
15
Fine-Grained Aesthetic Assessment (Series-level)FGAesthetics Cropping
Series Accuracy46.7
15
Fine-Grained Aesthetic Assessment (Series-level)FGAesthetics AIGC
Series-level Accuracy47.8
15
Fine-Grained Aesthetic Assessment (Pair-level)FGAesthetics AIGC
Accuracy62
15
Image Aesthetic Quality AssessmentPARA (test)
SRCC0.905
8
Showing 10 of 11 rows

Other info

Follow for update