Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification

About

Tabular data forms the backbone of high-stakes decision systems in finance, healthcare, and beyond. Yet industrial tabular datasets are inherently difficult: high-dimensional, riddled with missing entries, and rarely labeled at scale. While foundation models have revolutionized vision and language, tabular learning still leans on handcrafted features and lacks a general self-supervised framework. We present MaskTab, a unified pre-training framework designed specifically for industrial-scale tabular data. MaskTab encodes missing values via dedicated learnable tokens, enabling the model to distinguish structural absence from random dropout. It jointly optimizes a hybrid supervised pre-training scheme--utilizing a twin-path architecture to reconcile masked reconstruction with task-specific supervision--and an MoE-augmented loss that adaptively routes features through specialized subnetworks. On industrial-scale benchmarks, it achieves +5.04% AUC and +8.28% KS over prior art under rigorous scaling. Moreover, its representations distill effectively into lightweight models, yielding +2.55% AUC and +4.85% KS under strict latency and interpretability constraints, while improving robustness to distribution shifts. Our work demonstrates that tabular data admits a foundation-model treatment--when its structural idiosyncrasies are respected.

Bo Zheng, Yudong Chen, Zihua Xiong, Shuai Fang, Peidong He, Yang Yang, Sheng Guo• 2026

Related benchmarks

TaskDatasetResultRank
Tabular LearningTabReD
Rank2.3
33
ClassificationTabReD
Accuracy (Homesite Insurance)96.35
15
RegressionTabReD Regression
Sberbank Housing Score0.2345
14
Binary ClassificationCreditRisk
ROC AUC75.84
9
Showing 4 of 4 rows

Other info

Follow for update