Cluster-Aware Neural Collapse Prompt Tuning for Long-Tailed Generalization of Vision-Language Models

About

Prompt learning has emerged as an efficient alternative to fine-tuning pre-trained vision-language models (VLMs). Despite its promise, current methods still struggle to maintain tail-class discriminability when adapting to class-imbalanced datasets. In this work, we propose cluster-aware neural collapse prompt tuning (CPT), which enhances the discriminability of tail classes in prompt-tuned VLMs without sacrificing their overall generalization. First, we design a cluster-invariant space by mining semantic assignments from the pre-trained VLM and mapping them to prompt-tuned features. This computes cluster-level boundaries and restricts the constraints to local neighborhoods, which reduces interference with the global semantic structure of the pre-trained VLM. Second, we introduce neural-collapse-driven discriminability optimization with three losses: textual Equiangular Tight Frame (ETF) separation loss, class-wise convergence loss, and rotation stabilization loss. These losses work together to shape intra-cluster geometry for better inter-class separation and intra-class alignment. Extensive experiments on 11 diverse datasets demonstrate that CPT outperforms SOTA methods, with stronger performance on long-tail classes and good generalization to unseen classes.

Boyang Guo, Liang Li, Lin Peng, Yuhan Gao, Xichun Sheng, Chenggang Yan• 2026

Related benchmarks

Task	Dataset	Result
Image Classification	ImageNet Domain Generalization (Source: ImageNet, Targets: ImageNetV2, ImageNet-Sketch, ImageNet-A, ImageNet-R) (test)	Accuracy (ImageNetV2)64.23	105
Base-to-New Classification	11 downstream datasets Balanced, τ=1	IN Accuracy73.92	6
Base-to-New Classification	11 downstream datasets Imbalanced, τ=0.25	Accuracy (IN.)72.62	6
Base-to-New Classification	11 downstream datasets Highly Imbalanced, τ=0.06	IN. Score71.58	6
Image Classification	ImageNet-to-Target Generalization Suite τ=0.25 (test)	IN Accuracy69.89	6
Image Classification	ImageNet-to-Target Generalization Suite (τ=0.06) (test)	Accuracy (IN)69.58	6
Image Classification	ImageNet-to-Target Generalization Suite Balance, τ=1 (test)	IN Accuracy71.4	6

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord