Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Cluster-Aware Neural Collapse Prompt Tuning for Long-Tailed Generalization of Vision-Language Models

About

Prompt learning has emerged as an efficient alternative to fine-tuning pre-trained vision-language models (VLMs). Despite its promise, current methods still struggle to maintain tail-class discriminability when adapting to class-imbalanced datasets. In this work, we propose cluster-aware neural collapse prompt tuning (CPT), which enhances the discriminability of tail classes in prompt-tuned VLMs without sacrificing their overall generalization. First, we design a cluster-invariant space by mining semantic assignments from the pre-trained VLM and mapping them to prompt-tuned features. This computes cluster-level boundaries and restricts the constraints to local neighborhoods, which reduces interference with the global semantic structure of the pre-trained VLM. Second, we introduce neural-collapse-driven discriminability optimization with three losses: textual Equiangular Tight Frame (ETF) separation loss, class-wise convergence loss, and rotation stabilization loss. These losses work together to shape intra-cluster geometry for better inter-class separation and intra-class alignment. Extensive experiments on 11 diverse datasets demonstrate that CPT outperforms SOTA methods, with stronger performance on long-tail classes and good generalization to unseen classes.

Boyang Guo, Liang Li, Lin Peng, Yuhan Gao, Xichun Sheng, Chenggang Yan• 2026

Related benchmarks

TaskDatasetResultRank
Image ClassificationImageNet Domain Generalization (Source: ImageNet, Targets: ImageNetV2, ImageNet-Sketch, ImageNet-A, ImageNet-R) (test)
Accuracy (ImageNetV2)64.23
105
Base-to-New Classification11 downstream datasets Balanced, τ=1
IN Accuracy73.92
6
Base-to-New Classification11 downstream datasets Imbalanced, τ=0.25
Accuracy (IN.)72.62
6
Base-to-New Classification11 downstream datasets Highly Imbalanced, τ=0.06
IN. Score71.58
6
Image ClassificationImageNet-to-Target Generalization Suite τ=0.25 (test)
IN Accuracy69.89
6
Image ClassificationImageNet-to-Target Generalization Suite (τ=0.06) (test)
Accuracy (IN)69.58
6
Image ClassificationImageNet-to-Target Generalization Suite Balance, τ=1 (test)
IN Accuracy71.4
6
Showing 7 of 7 rows

Other info

Follow for update