RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models

About

Recent works show that assembling multiple off-the-shelf large language models (LLMs) can harness their complementary abilities. To achieve this, routing is a promising method, which learns a router to select the most suitable LLM for each query. However, existing routing models are ineffective when multiple LLMs perform well for a query. To address this problem, in this paper, we propose a method called query-based Router by Dual Contrastive learning (RouterDC). The RouterDC model consists of an encoder and LLM embeddings, and we propose two contrastive learning losses to train the RouterDC model. Experimental results show that RouterDC is effective in assembling LLMs and largely outperforms individual top-performing LLMs as well as existing routing methods on both in-distribution (+2.76\%) and out-of-distribution (+1.90\%) tasks. Source code is available at https://github.com/shuhao02/RouterDC.

Shuhao Chen, Weisen Jiang, Baijiong Lin, James T. Kwok, Yu Zhang• 2024

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	--	1398
Question Answering	ARC Challenge	Accuracy56.7	906
Mathematical Reasoning	MATH	--	882
Mathematical Reasoning	GSM8K (test)	Accuracy93.68	816
Multitask Language Understanding	MMLU	Accuracy61	520
Code Generation	MBPP (test)	--	405
Multi-task Language Understanding	MMLU	Accuracy89	353
Reading Comprehension	RACE high	Accuracy78.3	295
Mathematical Reasoning	AMC	Accuracy62.5	221
Multi-hop Question Answering	2Wiki	--	215

Showing 10 of 107 rows

...

Other info

Code

Follow for update

@wizwand_team Discord