Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Supervised learning pays attention

About

In-context learning with attention enables large neural networks to make context-specific predictions by selectively focusing on relevant examples. Here, we adapt this idea to supervised learning procedures such as lasso regression and gradient boosting, for tabular data. Our goals are to (1) flexibly fit personalized models for each prediction point and (2) retain model simplicity and interpretability. Our method fits a local model for each test observation by weighting the training data according to attention, a supervised similarity measure that emphasizes features and interactions that are predictive of the outcome. Attention weighting allows the method to adapt to heterogeneous data in a data-driven way, without requiring cluster or similarity pre-specification. Further, our approach is uniquely interpretable: for each test observation, we identify which features are most predictive and which training observations are most relevant. We then show how to use attention weighting for time series and spatial data, and we present a method for adapting pretrained tree-based models to distributional shift using attention-weighted residual corrections. Across real and simulated datasets, attention weighting improves predictive performance while preserving interpretability, and theory shows that attention-weighting linear models attain lower mean squared error than the standard linear model under mixture-of-models data-generating processes with known subgroup structure.

Erin Craig, Robert Tibshirani• 2025

Related benchmarks

TaskDatasetResultRank
RegressionAuto MPG
Mean Relative Improvement (%)31.8
5
Regressionautomobile
Mean Relative Improvement over Lasso37.1
5
RegressionStock Portfolio Perf.
Mean Relative Improvement (%)60.4
5
RegressionFacebook Metrics
Mean Relative Improvement (Lasso)93.6
5
RegressionForest Fires
Mean Rel. Improvement over Lasso (%)-40
5
RegressionInfrared Therm. Temp.
Mean Relative Improvement over Lasso (%)3.4
5
RegressionServo
Mean Relative Improvement (%)63.8
5
RegressionAirfoil Self-Noise
Mean Relative Improvement (%)75
5
RegressionReal Estate Valuation
Mean Relative Improvement over Lasso (%)18.2
5
RegressionConcrete Comp. Strength
Mean Relative Improvement (%)62.8
5
Showing 10 of 18 rows

Other info

Follow for update