Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

A Contrastive Learning Framework Empowered by Attention-based Feature Adaptation for Street-View Image Classification

About

Street-view image attribute classification is a vital downstream task of image classification, enabling applications such as autonomous driving, urban analytics, and high-definition map construction. It remains computationally demanding whether training from scratch, initialising from pre-trained weights, or fine-tuning large models. Although pre-trained vision-language models such as CLIP offer rich image representations, existing adaptation or fine-tuning methods often rely on their global image embeddings, limiting their ability to capture fine-grained, localised attributes essential in complex, cluttered street scenes. To address this, we propose CLIP-MHAdapter, a variant of the current lightweight CLIP adaptation paradigm that appends a bottleneck MLP equipped with multi-head self-attention operating on patch tokens to model inter-patch dependencies. With approximately 1.4 million trainable parameters, CLIP-MHAdapter achieves superior or competitive accuracy across eight attribute classification tasks on the Global StreetScapes dataset, attaining new state-of-the-art results while maintaining low computational cost. The code is available at https://github.com/SpaceTimeLab/CLIP-MHAdapter.

Qi You, Yitai Cheng, Zichao Zeng, James Haworth• 2026

Related benchmarks

TaskDatasetResultRank
Street-view Image Attribute ClassificationGSS Lighting Condition
Accuracy96.46
7
Street-view Image Attribute ClassificationGSS Platform
Accuracy69.12
7
Street-view Image Attribute ClassificationGSS View Direction
Accuracy95.28
7
Street-view Image Attribute ClassificationGSS Panoramic Status
Accuracy99.4
7
Street-view Image Attribute ClassificationGSS Reflection
Accuracy76.69
7
Street-view Image Attribute ClassificationGSS Quality
Accuracy89.08
7
Street-view Image Attribute ClassificationGSS Weather
Accuracy81.84
7
Street-view Image Attribute ClassificationGSS Glare
Accuracy95.32
7
Showing 8 of 8 rows

Other info

Follow for update