Soft-TransFormers for Continual Learning
About
Inspired by the \emph{Well-initialized Lottery Ticket Hypothesis (WLTH)}, we introduce Soft-Transformer (Soft-TF), a parameter-efficient framework for continual learning that leverages soft, real-valued subnetworks over a frozen pre-trained Transformer. Instead of relying on manually designed prompts or adapters, Soft-TF learns task-specific multiplicative masks applied to the key, query, value, and output projections in self-attention. These masks enable smooth and stable task adaptation while preserving shared representations. Combined with a lightweight dual-prompt mechanism, Soft-TF maintains strong knowledge retention and mitigates Catastrophic Forgetting (CF). Across multiple continual learning benchmarks, Soft-TF achieves state-of-the-art performance, consistently outperforming prompt-based, adapter-based, and LoRA-style baselines while requiring minimal additional parameters.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Class-incremental learning | CIFAR-100 (10-split) | Accuracy97.87 | 63 | |
| Continual Learning | CIFAR-100 (10-split) | ACC92.35 | 54 | |
| Class-incremental learning | 5-Datasets | FAA95.68 | 49 | |
| Class-incremental learning | CUB-200 Split | FAA97.9 | 45 | |
| Class-incremental learning | Split ImageNet-R 10 incremental tasks | Class Accuracy82.38 | 40 | |
| Class-incremental learning | CIFAR100 20-Split | Accuracy99.05 | 38 | |
| Class-incremental learning | CIFAR100 10-Split | Accuracy (ACC)98.25 | 22 | |
| Class-incremental learning | ImageNet-R (10-Split) | Accuracy91.94 | 22 |