Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MP-ISMoE: Mixed-Precision Interactive Side Mixture-of-Experts for Efficient Transfer Learning

About

Parameter-efficient transfer learning (PETL) has emerged as a pivotal paradigm for adapting pre-trained foundation models to downstream tasks, significantly reducing trainable parameters yet suffering from substantial memory overhead caused by gradient backpropagation during fine-tuning. While memory-efficient transfer learning (METL) circumvents this challenge by bypassing backbone gradient computation via lightweight small side networks, its stringent memory constraint severely limits learning capacity of side networks, thereby significantly compromising performance. To address these limitations, we propose a novel Mixed-Precision Interactive Side Mixture-of-Experts framework (MP-ISMoE). Specifically, we first propose a Gaussian Noise Perturbed Iterative Quantization (GNP-IQ) scheme to quantize weights into lower-bits while effectively decreasing quantization errors. By leveraging memory conserved from GNP-IQ, we subsequently employ Interactive Side Mixture-of-Experts (ISMoE) to scaling up side networks without sacrificing overall memory efficiency. Different from conventional mixture-of-experts, ISMoE learns to select optimal experts by interacting with salient features from frozen backbones, thus suppressing knowledge forgetting and boosting performance. Extensive experiments across diverse vision-language and language-only tasks demonstrate that MP-ISMoE remarkably promotes accuracy compared to state-of-the-art METL approaches, while maintaining comparable parameter and memory efficiency.

Yutong Zhang, Zimeng Wu, Shangcai Liao, Shujiang Wu, Jiaxin Chen• 2026

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringVQA v2 (test-dev)
Overall Accuracy76.21
712
Visual GroundingRefCOCO+ (val)
Accuracy73.92
253
Visual GroundingRefCOCO+ (testA)
Accuracy80.51
245
Visual Question AnsweringGQA (test-dev)
Accuracy60.91
236
Visual GroundingRefCOCO+ (testB)
Accuracy65.02
219
Visual GroundingRefCOCO (val)
Accuracy83.49
172
Visual GroundingRefCOCO (testA)
Accuracy87.26
162
Visual GroundingRefCOCO (testB)
Accuracy79.2
159
Visual GroundingRefCOCOg (val)
Accuracy78.39
158
Visual GroundingRefCOCOg (test)
Accuracy78.08
155
Showing 10 of 18 rows

Other info

Follow for update