Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic

About

The advent of large language models (LLMs) like GPT-4 has catalyzed the exploration of multi-task learning (MTL), in which a single model demonstrates proficiency across diverse tasks. Task arithmetic has emerged as a cost-effective approach for MTL. It enables performance enhancement across multiple tasks by adding their corresponding task vectors to a pre-trained model. However, the current lack of a method that can simultaneously achieve optimal performance, computational efficiency, and data privacy limits their application to LLMs. In this paper, we propose \textbf{M}odel \textbf{E}xclusive \textbf{T}ask \textbf{A}rithmetic for merging \textbf{GPT}-scale models, which formalizes the objective of model merging into a multi-task learning framework, aiming to minimize the average loss difference between the merged model and each individual task model. Since data privacy limits the use of multi-task training data, we leverage LLMs' local linearity and task vectors' orthogonality to separate the data term and scaling coefficients term and derive a model-exclusive task arithmetic method. Our proposed MetaGPT is data-agnostic and bypasses the heavy search process, making it cost-effective and easy to implement for LLMs.Extensive experiments demonstrate that MetaGPT leads to improvements in task arithmetic and achieves state-of-the-art performance on multiple tasks.

Yuyan Zhou, Liang Song, Bingning Wang, Weipeng Chen• 2024

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringGQA
Accuracy56.53
963
Multimodal EvaluationMME
Score67.62
557
Visual Question AnsweringTextVQA (val)
VQA Score77.18
309
OCR EvaluationOCRBench
Score33.9
296
Visual Question AnsweringOKVQA
Top-1 Accuracy56.54
283
Visual Question AnsweringOK-VQA
Accuracy43.02
224
Multimodal UnderstandingSEED-Bench--
203
Text-based Visual Question AnsweringTextVQA (val)
Accuracy55.83
146
Visual Question AnsweringGQA (test)
Accuracy59.93
119
Multimodal ReasoningMMMU (val)
Accuracy34.9
114
Showing 10 of 34 rows

Other info

Follow for update