Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RotMoLE: Enhancing Mixture of Low-Rank Experts through Rotational Gating Mechanism

About

While Large Language Models (LLMs) are commonly fine-tuned to handle domain-specific tasks before being applied to vertical applications, adapting them to complex scenarios with diverse specialized knowledge remains challenging. Meanwhile, Mixture-of-Experts (MoE) architecture has risen as a crucial paradigm for training LLMs, and some recent works have also incorporated MoE into Parameter-Efficient Fine-Tuning (PEFT) to propose the Mixture of Low-rank Experts (MoE-LoRA), to enhance the power of low-rank adapters for learning complicated knowledge. However, conventional gating mechanisms in MoE typically apply only a scalar reweighing to selected experts, thereby limiting their underlying capacity of representation and generalization. Motivated and enabled by the low-rank structures in MoE-LoRA, we propose RotMoLE, a specialized MoE framework for low-rank experts featuring an additional rotation gate. Beyond simple scaling, RotMoLE implements a rotation mechanism for each selected expert, enabling superior expert exploitation and specialization for learning diverse data, especially when expert candidates are limited. Empirical results on complex multi-task and multilingual training scenarios validate our effectiveness.

Mengyang Sun, Maochuan Dou, Tao Feng, Dan Zhang, Yihao Wang, Junpeng Liu, Yifan Zhu, Jie Tang• 2026

Related benchmarks

TaskDatasetResultRank
Question AnsweringOBQA
Accuracy46.6
347
Social Interaction Question AnsweringSIQA
Accuracy47.85
157
Question AnsweringCSQA
Accuracy43
36
Showing 3 of 3 rows

Other info

Follow for update