Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models

About

Recent works show that assembling multiple off-the-shelf large language models (LLMs) can harness their complementary abilities. To achieve this, routing is a promising method, which learns a router to select the most suitable LLM for each query. However, existing routing models are ineffective when multiple LLMs perform well for a query. To address this problem, in this paper, we propose a method called query-based Router by Dual Contrastive learning (RouterDC). The RouterDC model consists of an encoder and LLM embeddings, and we propose two contrastive learning losses to train the RouterDC model. Experimental results show that RouterDC is effective in assembling LLMs and largely outperforms individual top-performing LLMs as well as existing routing methods on both in-distribution (+2.76\%) and out-of-distribution (+1.90\%) tasks. Source code is available at https://github.com/shuhao02/RouterDC.

Shuhao Chen, Weisen Jiang, Baijiong Lin, James T. Kwok, Yu Zhang• 2024

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K (test)
Accuracy93.68
751
Reading ComprehensionRACE high
Accuracy78.3
295
Code GenerationMBPP (test)--
276
Mathematical ReasoningAMC
Accuracy62.5
151
Multi-hop Question AnsweringMuSiQue--
106
Mathematical ReasoningAIME 24
AIME 24 Accuracy40
84
Question AnsweringWebQuestions (WebQs)
Accuracy50.8
67
Multi-hop Question AnsweringBamboogle
Accuracy50.4
52
CodeHumanEval
HumanEval Accuracy80.5
50
Visual Question AnsweringChest X-ray VQA (test)
Overall Accuracy45.61
43
Showing 10 of 33 rows

Other info

Code

Follow for update