Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling

About

With the proliferation of domain-specific models, model merging has emerged as a set of techniques that combine the capabilities of multiple models into one that can multitask without the cost of additional training. In this paper, we propose a new model merging technique, Drop and rEscaLe via sampLing with mAgnitude (DELLA-Merging), that employs a novel pruning technique, MAGPRUNE, which shows significant advantages over DARE and TIES. MAGPRUNE first ranks the parameters in order of their magnitude and assigns higher dropout probabilities (p) to parameters with lower ranks corresponding to lower magnitudes. To approximate the original embeddings, MAGPRUNE employs a rescaling operation on the parameters that survive the random dropping by 1/(1 - p). On three different expert models considered for merging (LM, Math, Code) and corresponding benchmark datasets (AlpacaEval, GSM8K, MBPP), DELLA shows an average improvement of 2.4 points over baseline methods employing delta parameter pruning (an improvement of 3.6 points over TIES, 1.2 points over DARE), and 11.1 points over the no-pruning baseline (TA). We release the source code at: https://github.com/declare-lab/della.

Pala Tej Deep, Rishabh Bhardwaj, Soujanya Poria• 2024

Related benchmarks

TaskDatasetResultRank
Model MergingAverage of 8 benchmarks
Average Accuracy47.48
72
Named Entity RecognitionMIT Movie
Entity F166.72
71
Relation ExtractionCoNLL 04
F128.78
59
Relation ExtractionCONLL04
Relation Strict F126.6
52
Named Entity RecognitiontweetNER7
Entity F155.62
49
Natural Language UnderstandingGLUE
SST-291.1
40
Relation ExtractionNew York Times
Precision88.9
32
Entity TypingFindVehicle
Precision70.42
32
Entity TypingFabNER
Precision59.71
32
Multi-domain evaluationGSM8K, MATH, HumanEval, MBPP, FinanceBench, ConvFinQA, PubMedQA, and MedQA USMLE
Math Accuracy27.78
24
Showing 10 of 15 rows

Other info

Follow for update