Towards a Relationship-Aware Transformer for Tabular Data

About

Deep learning models for tabular data typically do not allow for imposing a graph of external dependencies between samples, which can be useful for accounting for relatedness in tasks such as treatment effect estimation. Graph neural networks only consider adjacent nodes, making them difficult to apply to sparse graphs. This paper proposes several solutions based on a modified attention mechanism, which accounts for possible relationships between data points by adding a term to the attention matrix. Our models are compared with each other and the gradient boosting decision trees in a regression task on synthetic and real-world datasets, as well as in a treatment effect estimation task on the IHDP dataset.

Andrei V. Konstantinov, Valerii A. Zuev, Lev V. Utkin• 2025

Related benchmarks

Task	Dataset	Result
Treatment Effect Estimation	IHDP	PEHE Mean3.3	27
Regression	Synthetic two-feature data (Linear, n=300)	Mean R^20.9839	9
Regression	Synthetic two-feature data Linear, n=1000	Mean R^20.9935	9
Regression	Synthetic two-feature data Square, n=300	R^2 Mean0.9695	9
Regression	Synthetic two-feature data Square, n=1000	Mean R^20.9874	9
Regression	Synthetic two-feature data Sin, n=300	R^2 (Mean)0.9777	9
Regression	Synthetic two-feature data Sin n=1000	Mean R^20.991	9
Regression	Birds	MSE0.235	8
Regression	Life Expectancy	MSE (Mean)26.4	8
Regression	Synthetic Parabolas deterministic R matrix (test)	MSE (mean)0.002	6

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord