Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CLIP-Map: Structured Matrix Mapping for Parameter-Efficient CLIP Compression

About

Contrastive Language-Image Pre-training (CLIP) has achieved widely applications in various computer vision tasks, e.g., text-to-image generation, Image-Text retrieval and Image captioning. However, CLIP suffers from high memory and computation cost, which prohibits its usage to the resource-limited application scenarios. Existing CLIP compression methods typically reduce the size of pre-trained CLIP weights by selecting their subset as weight inheritance for further retraining via mask optimization or important weight measurement. However, these select-based weight inheritance often compromises the feature presentation ability, especially on the extreme compression. In this paper, we propose a novel mapping-based CLIP compression framework, CLIP-Map. It leverages learnable matrices to map and combine pretrained weights by Full-Mapping with Kronecker Factorization, aiming to preserve as much information from the original weights as possible. To mitigate the optimization challenges introduced by the learnable mapping, we propose Diagonal Inheritance Initialization to reduce the distribution shifting problem for efficient and effective mapping learning. Extensive experimental results demonstrate that the proposed CLIP-Map outperforms select-based frameworks across various compression ratios, with particularly significant gains observed under high compression settings.

Kangjie Zhang, Wenxuan Huang, Xin Zhou, Boxiang Zhou, Dejia Song, Yuan Xie, Baochang Zhang, Lizhuang Ma, Nemo Chen, Xu Tang, Yao Hu, Shaohui Lin• 2026

Related benchmarks

TaskDatasetResultRank
Image ClassificationEuroSAT--
497
Image ClassificationFlowers102
Accuracy70.2
478
Image ClassificationStanford Cars--
477
Image ClassificationSUN397--
425
Image ClassificationImageNet 1k (test)
Top-1 Accuracy63.7
359
Image ClassificationCIFAR100
Accuracy68.3
331
Image ClassificationFood101
Accuracy82.7
309
Image ClassificationGTSRB
Accuracy27
291
Image ClassificationMNIST
Accuracy13
263
Image ClassificationRESISC45--
263
Showing 10 of 23 rows

Other info

Follow for update