FEAT: A Linear-Complexity Foundation Model for Extremely Large Structured Data

About

Structured data is widely used in domains such as healthcare, finance, and scientific data management. Recent studies on structured data foundation models (SFMs) aim to support data analysis and mining tasks over such data, but still face scalability and generalization challenges when applied to real-world enterprise databases. First, many SFMs rely on full self-attention, which introduces an O(N^2) computational bottleneck and limits the number of tuples that can be processed jointly. Second, directly replacing attention with linear-complexity sequence models may conflict with the permutation-invariant nature of structured data, introducing artificial order bias and degrading representation quality. Moreover, models trained only on synthetic data may struggle to generalize to the heavy-tailed and heterogeneous distributions commonly found in real-world databases. To address these challenges, we propose FEAT, a linear-complexity foundation model for extremely large structured data. FEAT replaces quadratic attention with a multi-layer dual-axis encoding architecture. It integrates an adaptive-fusion bidirectional state-space model (AFBM) with convolutional gated linear attention (Conv-GLA), enabling cross-tuple contextualization in O(N) time while supporting permutation-invariant representation learning. To improve robustness under real-world data skewness, FEAT further adopts a hybrid structural causal pre-training pipeline with a robust reconstruction objective. Experiments on 12 real-world database benchmarks show that FEAT consistently outperforms representative SFMs on zero-shot tasks and scales linearly with structured-data sample length, achieving up to 50x faster inference latency.

Zhenghang Song, Tang Qian, Lu Chen, Yushuai Li, Zhengke Hu, Bingbing Fang, Yumeng Song, Junbo Zhao, Sheng Zhang, Tianyi Li• 2026

Related benchmarks

Task	Dataset	Result
Inference	Scalability and Efficiency Evaluation D=20 (test)	Inference Latency (ms)149.2	26
Regression	GI-REG	RMSE0.4703	10
Regression	BCCO-REG	RMSE0.406	10
Classification	GI-CLS	AUC0.8991	9
Classification	Tabarena CLS	AUC0.8638	9
Classification	Tabzilla CLS	AUC92.51	9
Regression	CTR23-REG	RMSE0.4053	9
Regression	PFN REG	RMSE0.5257	9
Classification	BCCO-CLS	AUC85.79	9
Regression	Talent-REG	RMSE0.4708	9

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord