Little by Little: Continual Learning via Incremental Mixture of Rank-1 Associative Memory Experts
About
Continual learning (CL) with large pre-trained models aims to incrementally acquire knowledge without catastrophic forgetting. Existing LoRA-based Mixture-of-Experts (MoE) methods expand capacity by adding isolated new experts while freezing old ones, but still suffer from redundancy, interference, routing ambiguity, and consequent forgetting. We investigate the issues stemming from coarse-grained expert granularity. Coarse-grained experts (e.g., high-rank LoRA) encode low-specialty information, leading to expert duplication/interference and routing degradation/confusion as experts accumulate. In this work, we propose MoRAM (Mixture of Rank-1 Associative Memory). Grounded in the view that weight matrices act as linear associative memories, MoRAM achieves CL as incremental expansion of reusable atomic rank-1 experts as memory. Each rank-1 adapter acts as a fine-grained MoE expert or an associative memory unit. By viewing rank-1 experts as key-value memory pairs, we eliminate explicit MoE-LoRA routers with self-activation, where each memory atom evaluates its relevance via its intrinsic key. The inference process thus becomes a content-addressable retrieval and recall over the incrementally accumulated memory of learning snapshots. Extensive experiments on CLIP and LLMs show that MoRAM significantly outperforms state-of-the-art methods, achieving a better plasticity-stability trade-off, stronger generalization, and reduced forgetting. Project Page: https://artificer-ai-lab.github.io/MoRAM/.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Continual Learning | TRACE | BWT (%)3.12 | 124 | |
| Continual Learning | Standard CL Benchmark | Avg Final Acc0.776 | 71 | |
| Continual Learning | Standard CL benchmark (Yelp, Amazon, DBpedia, Yahoo, AG News) latest (test) | Accuracy (CL Suite Test)79.3 | 57 | |
| Continual Learning | Large Number of Tasks | Average Performance69.7 | 50 | |
| Multi-domain Task-Incremental Learning | MTIL Order I 5-shot (test) | Accuracy (Caltech101)95.4 | 46 | |
| Continual Learning | Continual Learning Benchmark 15-Task | Average Accuracy68.32 | 28 | |
| Continual Learning | X-TAIL | Average Score80.9 | 27 | |
| Continual Learning | SuperNI | AP51.79 | 13 | |
| Continual Learning | 15-task Sequence Order-6 | Average Accuracy71.95 | 12 | |
| Image Classification | X-TAIL Average | Aircraft Accuracy81.6 | 12 |