Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language Models

About

The continual learning (CL) ability is vital for deploying large language models (LLMs) in the dynamic world. Existing methods devise the learning module to acquire task-specific knowledge with parameter-efficient tuning (PET) block and the selection module to pick out the corresponding one for the testing input, aiming at handling the challenges of catastrophic forgetting and knowledge transfer in CL. However, these methods tend to address only one of the challenges, ignoring the potential of aligning the two modules to effectively address catastrophic forgetting and knowledge transfer simultaneously. To this end, we propose a novel Shared Attention Framework (SAPT), to align the PET learning and selection via the Shared Attentive Learning \& Selection module. Extensive Experiments on two CL benchmarks demonstrate the superiority of SAPT. Moreover, SAPT consistently demonstrates its superiority when we scale it to different model sizes (from 770M to 13B), different model architectures (T5 and LLaMA-2) and unseen tasks.

Weixiang Zhao, Shilong Wang, Yulin Hu, Yanyan Zhao, Bing Qin, Xuanyu Zhang, Qing Yang, Dongliang Xu, Wanxiang Che• 2024

Related benchmarks

TaskDatasetResultRank
Continual LearningLarge Number of Tasks
Average Performance81.9
50
Continual LearningStandard CL Benchmark
Avg Final Acc0.811
50
Continual LearningLong Sequence (test)
AP82.02
15
Continual LearningLong Sequence Benchmark
OP82
14
Continual LearningSuperNI Benchmark
Average Score50.9
14
Continual LearningSuperNI (test)
AP56.23
13
Continual LearningLarge Number of Tasks (test)
Backward Transfer (BWT)-2.9
13
Continual LearningSuperNI Standard CL Benchmark (test)
Average Performance81.9
13
Continual LearningSuperNI Large Number of Tasks (test)
Average Performance82.1
13
Continual LearningSuperNI
BWT-0.56
10
Showing 10 of 14 rows

Other info

Code

Follow for update