Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

About

We challenge a common assumption underlying most supervised deep learning: that a model makes a prediction depending only on its parameters and the features of a single input. To this end, we introduce a general-purpose deep learning architecture that takes as input the entire dataset instead of processing one datapoint at a time. Our approach uses self-attention to reason about relationships between datapoints explicitly, which can be seen as realizing non-parametric models using parametric attention mechanisms. However, unlike conventional non-parametric models, we let the model learn end-to-end from the data how to make use of other datapoints for prediction. Empirically, our models solve cross-datapoint lookup and complex reasoning tasks unsolvable by traditional deep learning models. We show highly competitive results on tabular data, early results on CIFAR-10, and give insight into how the model makes use of the interactions between points.

Jannik Kossen, Neil Band, Clare Lyle, Aidan N. Gomez, Tom Rainforth, Yarin Gal• 2021

Related benchmarks

TaskDatasetResultRank
Tabular Data ClassificationUCI machine learning repository 21 datasets (test)
Median Rank11
29
ClassificationDVS-Gesture (test)
Accuracy67.83
14
Tabular ClassificationUCI machine learning repository small-sized (test)
Median Rank11
7
Classificationblastchar medium-sized (test)
Accuracy79.98
5
Regressioncolleges medium-sized (test)
Mean Squared Error (x1000)25.67
5
Classificationshrutime medium-sized (test)
Accuracy85.62
5
Classificationeye medium-sized (test)
Accuracy53.21
5
Regressionsulfur medium-sized (test)
MSE (x1000)1.24
5
Showing 8 of 8 rows

Other info

Follow for update