Efficient Lifelong Learning with A-GEM
About
In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task. In this work, we investigate the efficiency of current lifelong approaches, in terms of sample complexity, computational and memory cost. Towards this end, we first introduce a new and a more realistic evaluation protocol, whereby learners observe each example only once and hyper-parameter selection is done on a small and disjoint set of tasks, which is not used for the actual learning experience and evaluation. Second, we introduce a new metric measuring how quickly a learner acquires a new skill. Third, we propose an improved version of GEM (Lopez-Paz & Ranzato, 2017), dubbed Averaged GEM (A-GEM), which enjoys the same or even better performance as GEM, while being almost as computationally and memory efficient as EWC (Kirkpatrick et al., 2016) and other regularization-based methods. Finally, we show that all algorithms including A-GEM can learn even more quickly if they are provided with task descriptors specifying the classification tasks under consideration. Our experiments on several standard lifelong learning benchmarks demonstrate that A-GEM has the best trade-off between accuracy and efficiency.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Understanding | MMLU | Accuracy6.46 | 756 | |
| Reasoning | BBH | -- | 507 | |
| Physical Commonsense Reasoning | PIQA | Accuracy53.92 | 329 | |
| Generalized Zero-Shot Learning | CUB | -- | 250 | |
| Generalized Zero-Shot Learning | SUN | -- | 184 | |
| Generalized Zero-Shot Learning | AWA2 | S Score57.25 | 165 | |
| Continual Learning | Sequential MNIST | Avg Acc98.93 | 149 | |
| Text Classification | 20News | Accuracy93.31 | 101 | |
| Continual Learning | CIFAR100 Split | Average Per-Task Accuracy62.3 | 85 | |
| Incremental Learning | TinyImageNet | Avg Incremental Accuracy8.07 | 83 |