Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning

About

Prompt-based methods have recently gained prominence in Continual Learning (CL) due to their strong performance and memory efficiency. A prevalent strategy in this paradigm assigns a dedicated subset of prompts to each task, which, while effective, incurs substantial computational overhead and causes memory requirements to scale linearly with the number of tasks. Conversely, approaches employing a single shared prompt across tasks offer greater efficiency but often suffer from degraded performance due to knowledge interference. To reconcile this trade-off, we propose SMoPE, a novel framework that integrates the benefits of both task-specific and shared prompt strategies. Inspired by recent findings on the relationship between Prefix Tuning and Mixture of Experts (MoE), SMoPE organizes a shared prompt into multiple "prompt experts" within a sparse MoE architecture. For each input, only a select subset of relevant experts is activated, effectively mitigating interference. To facilitate expert selection, we introduce a prompt-attention score aggregation mechanism that computes a unified proxy score for each expert, enabling dynamic and sparse activation. Additionally, we propose an adaptive noise mechanism to encourage balanced expert utilization while preserving knowledge from prior tasks. To further enhance expert specialization, we design a prototype-based loss function that leverages prefix keys as implicit memory representations. Extensive experiments across multiple CL benchmarks demonstrate that SMoPE consistently outperforms task-specific prompt methods and achieves performance competitive with state-of-the-art approaches, all while significantly reducing parameter counts and computational costs.

Minh Le, Bao-Ngoc Dao, Huy Nguyen, Quyen Tran, Anh Nguyen, Nhat Ho• 2025

Related benchmarks

TaskDatasetResultRank
Class-incremental learningImageNet-R 10-task
FAA79.32
54
Class-incremental learningImageNet-R 5-task--
45
Class-incremental learningImageNet-R 20-task--
33
Continual LearningImageNet-R 10-task split
FAA79.32
26
Continual LearningCIFAR-100 (10-task split)
FAA89.23
18
Continual LearningCUB-200 (10-task split)
Forward Transfer Accuracy (FAA)87.43
12
Class-incremental learningImageNet-R 50-task partition
FAA75.54
10
Continual LearningImageNet-R 10-task split
Params (M)0.38
8
Showing 8 of 8 rows

Other info

Follow for update