FEAT: A Linear-Complexity Foundation Model for Extremely Large Structured Data
About
Structured data is widely used in domains such as healthcare, finance, and scientific data management. Recent studies on structured data foundation models (SFMs) aim to support data analysis and mining tasks over such data, but still face scalability and generalization challenges when applied to real-world enterprise databases. First, many SFMs rely on full self-attention, which introduces an O(N^2) computational bottleneck and limits the number of tuples that can be processed jointly. Second, directly replacing attention with linear-complexity sequence models may conflict with the permutation-invariant nature of structured data, introducing artificial order bias and degrading representation quality. Moreover, models trained only on synthetic data may struggle to generalize to the heavy-tailed and heterogeneous distributions commonly found in real-world databases. To address these challenges, we propose FEAT, a linear-complexity foundation model for extremely large structured data. FEAT replaces quadratic attention with a multi-layer dual-axis encoding architecture. It integrates an adaptive-fusion bidirectional state-space model (AFBM) with convolutional gated linear attention (Conv-GLA), enabling cross-tuple contextualization in O(N) time while supporting permutation-invariant representation learning. To improve robustness under real-world data skewness, FEAT further adopts a hybrid structural causal pre-training pipeline with a robust reconstruction objective. Experiments on 12 real-world database benchmarks show that FEAT consistently outperforms representative SFMs on zero-shot tasks and scales linearly with structured-data sample length, achieving up to 50x faster inference latency.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Inference | Scalability and Efficiency Evaluation D=20 (test) | Inference Latency (ms)149.2 | 26 | |
| Regression | GI-REG | RMSE0.4703 | 10 | |
| Regression | BCCO-REG | RMSE0.406 | 10 | |
| Classification | GI-CLS | AUC0.8991 | 9 | |
| Classification | Tabarena CLS | AUC0.8638 | 9 | |
| Classification | Tabzilla CLS | AUC92.51 | 9 | |
| Regression | CTR23-REG | RMSE0.4053 | 9 | |
| Regression | PFN REG | RMSE0.5257 | 9 | |
| Classification | BCCO-CLS | AUC85.79 | 9 | |
| Regression | Talent-REG | RMSE0.4708 | 9 |