Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models

About

Upcycled Mixture-of-Experts (MoE) models have shown great potential in various tasks by converting the original Feed-Forward Network (FFN) layers in pre-trained dense models into MoE layers. However, these models still suffer from significant parameter inefficiency due to the introduction of multiple experts. In this work, we propose a novel DeRS (Decompose, Replace, and Synthesis) paradigm to overcome this shortcoming, which is motivated by our observations about the unique redundancy mechanisms of upcycled MoE experts. Specifically, DeRS decomposes the experts into one expert-shared base weight and multiple expert-specific delta weights, and subsequently represents these delta weights in lightweight forms. Our proposed DeRS paradigm can be applied to enhance parameter efficiency in two different scenarios, including: 1) DeRS Compression for inference stage, using sparsification or quantization to compress vanilla upcycled MoE models; and 2) DeRS Upcycling for training stage, employing lightweight sparse or low-rank matrixes to efficiently upcycle dense models into MoE models. Extensive experiments across three different tasks show that the proposed methods can achieve extreme parameter efficiency while maintaining the performance for both training and compression of upcycled MoE models.

Yongqi Huang, Peng Ye, Chenyu Huang, Jianjian Cao, Lin Zhang, Baopu Li, Gang Yu, Tao Chen• 2025

Related benchmarks

TaskDatasetResultRank
Medical Visual Question AnsweringVQA-RAD--
106
Medical Visual Question AnsweringPathVQA--
86
Code GenerationHumanEval and MBPP
Overall Average Score60.9
30
Code GenerationHumanEval and MBPP EvalPlus
HumanEval+ Pass@k62.8
29
Medical Visual Question AnsweringSLAKE (test)
Closed Accuracy87.2
29
Medical Visual Question AnsweringVQA-RAD (test)--
13
Medical Visual Question AnsweringPathVQA (test)--
13
Multi-modal UnderstandingGeneral Multi-modal Evaluation Suite (VQAv2, GQA, VisWiz, ScienceQA-IMG, TextVQA, POPE, MMBench, MM-Vet) standard (test val)
VQAv2 Accuracy77.7
9
Medical Visual Question AnsweringMedical Multi-Modal Task Suite
Overall Score73
6
Medical Visual Question AnsweringMedical Multi-modal Task Aggregate (test)
Overall Score0.729
6
Showing 10 of 10 rows

Other info

Follow for update