Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Soft-TransFormers for Continual Learning

About

Inspired by the \emph{Well-initialized Lottery Ticket Hypothesis (WLTH)}, we introduce Soft-Transformer (Soft-TF), a parameter-efficient framework for continual learning that leverages soft, real-valued subnetworks over a frozen pre-trained Transformer. Instead of relying on manually designed prompts or adapters, Soft-TF learns task-specific multiplicative masks applied to the key, query, value, and output projections in self-attention. These masks enable smooth and stable task adaptation while preserving shared representations. Combined with a lightweight dual-prompt mechanism, Soft-TF maintains strong knowledge retention and mitigates Catastrophic Forgetting (CF). Across multiple continual learning benchmarks, Soft-TF achieves state-of-the-art performance, consistently outperforming prompt-based, adapter-based, and LoRA-style baselines while requiring minimal additional parameters.

Haeyong Kang, Chang D. Yoo• 2024

Related benchmarks

TaskDatasetResultRank
Class-incremental learningCIFAR-100 (10-split)
Accuracy97.87
63
Continual LearningCIFAR-100 (10-split)
ACC92.35
54
Class-incremental learning5-Datasets
FAA95.68
49
Class-incremental learningCUB-200 Split
FAA97.9
45
Class-incremental learningSplit ImageNet-R 10 incremental tasks
Class Accuracy82.38
40
Class-incremental learningCIFAR100 20-Split
Accuracy99.05
38
Class-incremental learningCIFAR100 10-Split
Accuracy (ACC)98.25
22
Class-incremental learningImageNet-R (10-Split)
Accuracy91.94
22
Showing 8 of 8 rows

Other info

Follow for update