Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Phase-Aware Mixture of Experts for Agentic Reinforcement Learning

About

Reinforcement learning (RL) has equipped LLM agents with a strong ability to solve complex tasks. However, existing RL methods normally use a \emph{single} policy network, causing \emph{simplicity bias} where simple tasks occupy most parameters and dominate gradient updates, leaving insufficient capacity for complex tasks. A plausible remedy could be employing the Mixture-of-Experts (MoE) architecture in the policy network, as MoE allows different parameters (experts) to specialize in different tasks, preventing simple tasks from dominating all parameters. However, a key limitation of traditional MoE is its token-level routing, where the router assigns each token to specialized experts, which fragments phase-consistent patterns into scattered expert assignments and thus undermines expert specialization. In this paper, we propose \textbf{Phase-Aware Mixture of Experts (PA-MoE)}. It first features a lightweight \emph{phase router} that learns latent phase boundaries directly from the RL objective without pre-defining phase categories. Then, the phase router allocates temporally consistent assignments to the same expert, allowing experts to preserve phase-specific expertise. Experimental results demonstrate the effectiveness of our proposed PA-MoE.

Shengtian Yang, Yu Li, Shuo He, Yewen Li, Qingpeng Cai, Peng Jiang, Lei Feng (1) __INSTITUTION_7__ Southeast University, (2) Nanyang Technological University, (3) Kuaishou Technology)• 2026

Related benchmarks

TaskDatasetResultRank
Interactive Decision-makingALFWorld (test)
Success Rate95.3
67
Interactive Decision-makingWebShop (test)
Score93.1
28
Showing 2 of 2 rows

Other info

Follow for update