Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing

About

Recent attempts to combine low-rank adaptation (LoRA) with mixture-of-experts (MoE) for multi-task adaptation of Large Language Models (LLMs) often replace whole attention/FFN layers with switch experts or append parallel expert branches, undermining parameter efficiency and limiting task specialization. We introduce LoRA-Mixer, a modular MoE framework that routes task-specific LoRA experts into the core projection matrices of the attention module, namely input and output linear layers, rather than primarily targeting FFN blocks. The design delivers fine-grained token-level specialization by fully exploiting the attention mechanism, while remaining drop-in compatible with Transformers and state-space models (SSMs), since linear projection layers are ubiquitous. To train robust routers from limited data while promoting stable, selective decisions and high expert reuse, LoRA-Mixer employs an adaptive Routing Specialization Loss (RSL) that jointly enforces global load balance and input-aware specialization via an entropy-shaping objective. The framework supports two regimes: (i) joint optimization of adapters and router with a differentiable hard-soft top-k routing scheme, and (ii) plug-and-play routing over frozen, pre-trained LoRA modules sourced from public repositories. Across 15 benchmarks, including MedQA, GSM8K, HumanEval, and GLUE, RSL-optimized LoRA-Mixer outperforms state-of-the-art routing and LoRA-MoE baselines while using 48 percent of their trainable parameters, with gains of 3.79, 2.90, and 3.95 percentage points on GSM8K, CoLA, and ARC-C, respectively. Cross-model transfer and adapter reuse experiments further demonstrate the approach's versatility and data efficiency. Our code is available at https://github.com/hustcselwb/LoRA-Mixer.

Wenbing Li, Zikai Song, Hang Zhou, Yunyao Zhang, Junqing Yu, Wei Yang• 2025

Related benchmarks

TaskDatasetResultRank
Question AnsweringARC-E
Accuracy89.47
523
Mathematical ReasoningGSM8K
Accuracy (Acc)65.53
337
Question AnsweringARC-C
Accuracy79.89
258
Commonsense ReasoningPIQA
Accuracy84.94
213
Question AnsweringBoolQ
Accuracy79.37
201
Commonsense Question AnsweringARC-E
Accuracy89.88
29
Commonsense Question AnsweringARC Challenge
Accuracy83.24
21
Linguistic AcceptabilityCOLA
Accuracy (CoLA)85.91
21
Natural Language InferenceGLUE
MNLI Accuracy91.45
20
Medical Question Answeringmedical
Score81.55
14
Showing 10 of 17 rows

Other info

Follow for update