Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Explain My Surprise: Learning Efficient Long-Term Memory by Predicting Uncertain Outcomes

About

In many sequential tasks, a model needs to remember relevant events from the distant past to make correct predictions. Unfortunately, a straightforward application of gradient based training requires intermediate computations to be stored for every element of a sequence. This requires to store prohibitively large intermediate data if a sequence consists of thousands or even millions elements, and as a result, makes learning of very long-term dependencies infeasible. However, the majority of sequence elements can usually be predicted by taking into account only temporally local information. On the other hand, predictions affected by long-term dependencies are sparse and characterized by high uncertainty given only local information. We propose MemUP, a new training method that allows to learn long-term dependencies without backpropagating gradients through the whole sequence at a time. This method can potentially be applied to any recurrent architecture. LSTM network trained with MemUP performs better or comparable to baselines while requiring to store less intermediate data.

Artyom Sorokin, Nazar Buzun, Leonid Pugachev, Mikhail Burtsev• 2022

Related benchmarks

TaskDatasetResultRank
Permuted Image ClassificationpMNIST 3136
Accuracy94.3
7
Sequence CopyingCopy Length 120
Accuracy100
7
Permuted Image ClassificationpMNIST 784
Accuracy95.4
7
Scattered Sequence CopyingScattered Copy Length 5020
Accuracy0.999
6
Sequence CopyingCopy Length 5020
Accuracy99.3
6
Sequence AdditionAdd Length 5000
MSE0.0053
6
Showing 6 of 6 rows

Other info

Code

Follow for update