Episodic Memory in Lifelong Language Learning

About

We introduce a lifelong language learning setup where a model needs to learn from a stream of text examples without any dataset identifier. We propose an episodic memory model that performs sparse experience replay and local adaptation to mitigate catastrophic forgetting in this setup. Experiments on text classification and question answering demonstrate the complementary benefits of sparse experience replay and local adaptation to allow the model to continuously learn from new datasets. We also show that the space complexity of the episodic memory module can be reduced significantly (~50-90%) by randomly choosing which examples to store in memory with a minimal decrease in performance. We consider an episodic memory component as a crucial building block of general linguistic intelligence and see our model as a first step in that direction.

Cyprien de Masson d'Autume, Sebastian Ruder, Lingpeng Kong, Dani Yogatama• 2019

Related benchmarks

Task	Dataset	Result
Text Classification	Yahoo! Answers (test)	--	133
Text Classification	Yelp (test)	--	100
Incremental Learning	TinyImageNet	Avg Incremental Accuracy8.49	83
Text Classification	AGNews, Amazon, DBPedia, Yahoo, and Yelp (test)	Exact Match (EM)76.7	55
Continual Learning	Large Number of Tasks	Average Performance7.4	50
Continual Learning	Standard CL Benchmark	BWT (Avg Order 1-3)57.8	38
Image Classification	S-MNIST (test)	Average Accuracy99.2	30
Continual Learning	SuperNI Order 2	AP40.23	20
Continual Learning	SuperNI (Order 1)	AP37.48	20
Continual Learning	Long Sequence Order 2	Average Performance (AP)59.98	20

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord