MLLM-CL: Continual Learning for Multimodal Large Language Models
About
Recent Multimodal Large Language Models (MLLMs) excel in vision-language understanding but face challenges in adapting to dynamic real-world scenarios that require continuous integration of new knowledge and skills. While continual learning (CL) offers a potential solution, existing benchmarks and methods suffer from critical limitations. In this paper, we introduce MLLM-CL, a novel benchmark encompassing domain and ability continual learning, where the former focuses on independently and identically distributed (IID) evaluation across evolving mainstream domains, whereas the latter evaluates on non-IID scenarios with new model abilities. Methodologically, we propose preventing catastrophic interference through parameter isolation and an MLLM-based routing mechanism. Extensive experiments demonstrate that our approach can integrate domain-specific knowledge and functional abilities with minimal forgetting, significantly outperforming existing methods. Our benchmark and code are available at https://github.com/bjzhb666/MLLM-CL.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Continual Learning | MLLM-CL | RS Last Score79.87 | 18 | |
| Domain-incremental learning | MLLM-CL Domain | RS Score80.87 | 17 | |
| Continual Learning | MLLM-CL Ability | OCR Score33.7 | 17 | |
| Continual Learning | MLLM-CL (test) | RS Score80.9 | 13 |