A Faster Path to Continual Learning
About
Continual Learning (CL) aims to train neural networks on a dynamic stream of tasks without forgetting previously learned knowledge. Among optimization-based approaches, C-Flat has emerged as a promising solution due to its plug-and-play nature and its ability to encourage uniformly low-loss regions for both new and old tasks. However, C-Flat requires three additional gradient computations per iteration, imposing substantial overhead on the optimization process. In this work, we propose C-Flat Turbo, a faster yet stronger optimizer that significantly reduces the training cost. We show that the gradients associated with first-order flatness contain direction-invariant components relative to the proxy-model gradients, enabling us to skip redundant gradient computations in the perturbed ascent steps. Moreover, we observe that these flatness-promoting gradients progressively stabilize across tasks, which motivates a linear scheduling strategy with an adaptive trigger to allocate larger turbo steps for later tasks. Experiments show that C-Flat Turbo is 1.0$\times$ to 1.25$\times$ faster than C-Flat across a wide range of CL methods, while achieving comparable or even improved accuracy.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Class-incremental learning | ImageNet-R B0 Inc20 | Last Accuracy77.83 | 79 | |
| Class-incremental learning | CIFAR-100 B0_Inc10 | Avg Accuracy94.45 | 43 | |
| Class-incremental learning | CUB (B0 Inc10) | Last Accuracy89.02 | 39 | |
| Class-incremental learning | ObjNet B0 Inc10 | Avg Accuracy72.16 | 15 | |
| Continual Learning | CIFAR-100 | Avg Accuracy69.48 | 12 |