CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model

About

Instruction tuning represents a prevalent strategy employed by Multimodal Large Language Models (MLLMs) to align with human instructions and adapt to new tasks. Nevertheless, MLLMs encounter the challenge of adapting to users' evolving knowledge and demands. Therefore, how to retain existing skills while acquiring new knowledge needs to be investigated. In this paper, we present a comprehensive benchmark, namely Continual Instruction tuNing (CoIN), to assess existing MLLMs in the sequential instruction tuning paradigm. CoIN comprises 10 commonly used datasets spanning 8 task categories, ensuring a diverse range of instructions and tasks. Besides, the trained model is evaluated from two aspects: Instruction Following and General Knowledge, which assess the alignment with human intention and knowledge preserved for reasoning, respectively. Experiments on CoIN demonstrate that current powerful MLLMs still suffer catastrophic forgetting, and the failure in intention alignment assumes the main responsibility, instead of the knowledge forgetting. To this end, we introduce MoELoRA to MLLMs which is effective to retain the previous instruction alignment. Experimental results consistently illustrate the forgetting decreased from this method on CoIN.

Cheng Chen, Junchen Zhu, Xu Luo, Hengtao Shen, Lianli Gao, Jingkuan Song• 2024

Related benchmarks

Task	Dataset	Result
Embodied Navigation	LENL (test)	SR-F (S1)86	44
Multimodal Continual Instruction Tuning	UCIT (Unified Continual Instruction Tuning)	Average Score (UCIT)60.79	40
Continual Instruction Tuning	UCIT	Image-R Score70.5	30
Anomaly Detection	Anomaly-ShapeNet	--	30
Lifelong Embodied Navigation	LENL (test)	S1 Success Rate97	22
Robotic Manipulation	LLCRM 1.0 (test)	S1 Score93	22
Continual Instruction Tuning	MLLM-DCL	RS Score76.96	20
Continual Learning	COIN	Backward Transfer (BWT)-22.18	20
Multimodal Continual Learning	Overall 15 Chunks	MAP59.49	18
Multimodal Continual Learning	Overall 20 Chunks	MAP60.67	18

Showing 10 of 48 rows

Other info

Follow for update

@wizwand_team Discord