Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Understanding the Role of Training Regimes in Continual Learning

About

Catastrophic forgetting affects the training of neural networks, limiting their ability to learn multiple tasks sequentially. From the perspective of the well established plasticity-stability dilemma, neural networks tend to be overly plastic, lacking the stability necessary to prevent the forgetting of previous knowledge, which means that as learning progresses, networks tend to forget previously seen tasks. This phenomenon coined in the continual learning literature, has attracted much attention lately, and several families of approaches have been proposed with different degrees of success. However, there has been limited prior work extensively analyzing the impact that different training regimes -- learning rate, batch size, regularization method-- can have on forgetting. In this work, we depart from the typical approach of altering the learning algorithm to improve stability. Instead, we hypothesize that the geometrical properties of the local minima found for each task play an important role in the overall degree of forgetting. In particular, we study the effect of dropout, learning rate decay, and batch size, on forming training regimes that widen the tasks' local minima and consequently, on helping it not to forget catastrophically. Our study provides practical insights to improve stability via simple yet effective techniques that outperform alternative baselines.

Seyed Iman Mirzadeh, Mehrdad Farajtabar, Razvan Pascanu, Hassan Ghasemzadeh• 2020

Related benchmarks

TaskDatasetResultRank
Continual LearningCIFAR100 Split
Average Per-Task Accuracy59.9
85
Continual LearningPermuted MNIST
Mean Test Accuracy80.1
44
Continual Image ClassificationMiniImageNet Split
Accuracy51.81
29
Continual Learning5-dataset
Accuracy53.4
16
Lifelong LearningSplit miniImageNet (test)
Accuracy51.81
15
Lifelong Learning5-dataset (test)
Accuracy46.51
15
Continual LearningRotated-MNIST
Accuracy70.8
13
Task-Incremental LearningCIFAR-100 (20-split)
Accuracy57.4
12
Continual LearningSplit Mini-ImageNet
Avg Per-Task Accuracy57.79
11
Lifelong LearningCIFAR100 Split
Accuracy57.04
8
Showing 10 of 14 rows

Other info

Follow for update