A Faster Path to Continual Learning

About

Continual Learning (CL) aims to train neural networks on a dynamic stream of tasks without forgetting previously learned knowledge. Among optimization-based approaches, C-Flat has emerged as a promising solution due to its plug-and-play nature and its ability to encourage uniformly low-loss regions for both new and old tasks. However, C-Flat requires three additional gradient computations per iteration, imposing substantial overhead on the optimization process. In this work, we propose C-Flat Turbo, a faster yet stronger optimizer that significantly reduces the training cost. We show that the gradients associated with first-order flatness contain direction-invariant components relative to the proxy-model gradients, enabling us to skip redundant gradient computations in the perturbed ascent steps. Moreover, we observe that these flatness-promoting gradients progressively stabilize across tasks, which motivates a linear scheduling strategy with an adaptive trigger to allocate larger turbo steps for later tasks. Experiments show that C-Flat Turbo is 1.0$\times$ to 1.25$\times$ faster than C-Flat across a wide range of CL methods, while achieving comparable or even improved accuracy.

Wei Li, Hangjie Yuan, Zixiang Zhao, Borui Kang, Ziwei Liu, Tao Feng• 2026

Related benchmarks

Task	Dataset	Result
Class-incremental learning	ImageNet-R B0 Inc20	Last Accuracy77.83	98
Class-incremental learning	CIFAR-100 B0_Inc10	Avg Accuracy94.45	60
Class-incremental learning	CUB (B0 Inc10)	Last Accuracy89.02	39
Class-incremental learning	ObjNet B0 Inc10	Avg Accuracy72.16	15
Continual Learning	CIFAR-100	Avg Accuracy69.48	12

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord