Towards a Relationship-Aware Transformer for Tabular Data
About
Deep learning models for tabular data typically do not allow for imposing a graph of external dependencies between samples, which can be useful for accounting for relatedness in tasks such as treatment effect estimation. Graph neural networks only consider adjacent nodes, making them difficult to apply to sparse graphs. This paper proposes several solutions based on a modified attention mechanism, which accounts for possible relationships between data points by adding a term to the attention matrix. Our models are compared with each other and the gradient boosting decision trees in a regression task on synthetic and real-world datasets, as well as in a treatment effect estimation task on the IHDP dataset.
Andrei V. Konstantinov, Valerii A. Zuev, Lev V. Utkin• 2025
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Treatment Effect Estimation | IHDP | PEHE Mean3.3 | 24 | |
| Regression | Synthetic two-feature data (Linear, n=300) | Mean R^20.9839 | 9 | |
| Regression | Synthetic two-feature data Linear, n=1000 | Mean R^20.9935 | 9 | |
| Regression | Synthetic two-feature data Square, n=300 | R^2 Mean0.9695 | 9 | |
| Regression | Synthetic two-feature data Square, n=1000 | Mean R^20.9874 | 9 | |
| Regression | Synthetic two-feature data Sin, n=300 | R^2 (Mean)0.9777 | 9 | |
| Regression | Synthetic two-feature data Sin n=1000 | Mean R^20.991 | 9 | |
| Regression | Birds | MSE0.235 | 8 | |
| Regression | Life Expectancy | MSE (Mean)26.4 | 8 | |
| Regression | Synthetic Parabolas deterministic R matrix (test) | MSE (mean)0.002 | 6 |
Showing 10 of 11 rows