MoE-GRPO: Optimizing Mixture-of-Experts via Reinforcement Learning in Vision-Language Models

About

Mixture-of-Experts (MoE) has emerged as an effective approach to reduce the computational overhead of Transformer architectures by sparsely activating a subset of parameters for each token while preserving high model capacity. This paradigm has recently been extended to Vision-Language Models (VLMs), enabling scalable multi-modal understanding with reduced computational cost. However, the widely adopted deterministic top-K routing mechanism may overlook more optimal expert combinations and lead to expert overfitting. To address this limitation and improve the diversity of expert selection, we propose MoE-GRPO, a reinforcement learning (RL)-based framework for optimizing expert routing in MoE-based VLMs. Specifically, we formulate expert selection as a sequential decision-making problem and optimize it using Group Relative Policy Optimization (GRPO), allowing the model to learn adaptive expert routing policies through exploration and reward-based feedback. Furthermore, we introduce a modality-aware router guidance that enhances training stability and efficiency by discouraging the router from exploring experts that are infrequently activated for a given modality. Extensive experiments on multi-modal image and video benchmarks show that MoE-GRPO consistently outperforms standard top-K routing and its variants by promoting more diverse expert selection, thereby mitigating expert overfitting and enabling a task-level expert specialization.

Dohwan Ko, Jinyoung Park, Seoung Choi, Sanghyeok Lee, Seohyun Lee, Hyunwoo J. Kim• 2026

Related benchmarks

Task	Dataset	Result
Image Classification	ImageNet V2	--	749
Image Classification	DTD	Accuracy53.8	599
Image Classification	EuroSAT	Accuracy58.9	569
Image Classification	Flowers102	Accuracy75.9	558
Image Classification	Food101	Accuracy90.5	457
Image Classification	SUN397	Accuracy71.1	450
Multimodal Understanding	MMStar	--	407
Image Classification	StanfordCars	Accuracy75.5	384
Image Classification	Aircraft	Accuracy30	340
Image Classification	OxfordPets	Accuracy91.6	298

Showing 10 of 22 rows

Other info

Follow for update

@wizwand_team Discord