Explain My Surprise: Learning Efficient Long-Term Memory by Predicting Uncertain Outcomes

About

In many sequential tasks, a model needs to remember relevant events from the distant past to make correct predictions. Unfortunately, a straightforward application of gradient based training requires intermediate computations to be stored for every element of a sequence. This requires to store prohibitively large intermediate data if a sequence consists of thousands or even millions elements, and as a result, makes learning of very long-term dependencies infeasible. However, the majority of sequence elements can usually be predicted by taking into account only temporally local information. On the other hand, predictions affected by long-term dependencies are sparse and characterized by high uncertainty given only local information. We propose MemUP, a new training method that allows to learn long-term dependencies without backpropagating gradients through the whole sequence at a time. This method can potentially be applied to any recurrent architecture. LSTM network trained with MemUP performs better or comparable to baselines while requiring to store less intermediate data.

Artyom Sorokin, Nazar Buzun, Leonid Pugachev, Mikhail Burtsev• 2022

Related benchmarks

Task	Dataset	Result
Permuted Image Classification	pMNIST 3136	Accuracy94.3	7
Sequence Copying	Copy Length 120	Accuracy100	7
Permuted Image Classification	pMNIST 784	Accuracy95.4	7
Scattered Sequence Copying	Scattered Copy Length 5020	Accuracy0.999	6
Sequence Copying	Copy Length 5020	Accuracy99.3	6
Sequence Addition	Add Length 5000	MSE0.0053	6

Showing 6 of 6 rows

Other info

Code

Follow for update

@wizwand_team Discord