LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing

About

Recent attempts to combine low-rank adaptation (LoRA) with mixture-of-experts (MoE) for multi-task adaptation of Large Language Models (LLMs) often replace whole attention/FFN layers with switch experts or append parallel expert branches, undermining parameter efficiency and limiting task specialization. We introduce LoRA-Mixer, a modular MoE framework that routes task-specific LoRA experts into the core projection matrices of the attention module, namely input and output linear layers, rather than primarily targeting FFN blocks. The design delivers fine-grained token-level specialization by fully exploiting the attention mechanism, while remaining drop-in compatible with Transformers and state-space models (SSMs), since linear projection layers are ubiquitous. To train robust routers from limited data while promoting stable, selective decisions and high expert reuse, LoRA-Mixer employs an adaptive Routing Specialization Loss (RSL) that jointly enforces global load balance and input-aware specialization via an entropy-shaping objective. The framework supports two regimes: (i) joint optimization of adapters and router with a differentiable hard-soft top-k routing scheme, and (ii) plug-and-play routing over frozen, pre-trained LoRA modules sourced from public repositories. Across 15 benchmarks, including MedQA, GSM8K, HumanEval, and GLUE, RSL-optimized LoRA-Mixer outperforms state-of-the-art routing and LoRA-MoE baselines while using 48 percent of their trainable parameters, with gains of 3.79, 2.90, and 3.95 percentage points on GSM8K, CoLA, and ARC-C, respectively. Cross-model transfer and adapter reuse experiments further demonstrate the approach's versatility and data efficiency. Our code is available at https://github.com/hustcselwb/LoRA-Mixer.

Wenbing Li, Zikai Song, Hang Zhou, Yunyao Zhang, Junqing Yu, Wei Yang• 2025

Related benchmarks

Task	Dataset	Result
Medical Question Answering	MedMCQA	Accuracy57.14	591
Question Answering	ARC-E	Accuracy89.47	544
Commonsense Reasoning	PIQA	Accuracy84.94	400
Mathematical Reasoning	GSM8K	Accuracy (Acc)65.53	352
Question Answering	ARC-C	Accuracy79.89	283
Question Answering	BoolQ	Accuracy79.37	233
Math Word Problem Solving	GSM8K	Accuracy82.15	117
Linguistic Acceptability	COLA	Accuracy (CoLA)85.91	108
Boolean Question Answering	BoolQ	Accuracy83.7	56
Commonsense Question Answering	ARC-E	Accuracy89.88	29

Showing 10 of 25 rows

Other info

Follow for update

@wizwand_team Discord