Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence

About

We argue that progress toward general intelligence requires complementary foundation models grounded in language, the physical world, and structured data. This report presents LimiX-16M and LimiX-2M, two instantiations of our large structured-data models (LDMs). Both models treat structured data as a joint distribution over variables and missingness, thus capable of addressing a wide range of tabular tasks through query-based conditional prediction via a single model. They are pretrained using masked joint-distribution modeling with an episodic, context-conditional objective, supporting rapid, training-free adaptation at inference. We evaluate LimiX models across 11 large structured-data benchmarks with broad regimes of sample size, feature dimensionality, class number, categorical-to-numerical feature ratio, missingness, and sample-to-feature ratios. LimiX-16M consistently surpasses strong baselines, as shown in Figure 1 and Figure 2. The superiority holds across a wide range of tasks, such as classification, regression, missing value imputation, and data generation, often by substantial margins, while avoiding task-specific architectures or bespoke training per task. Notably, LimiX-2M delivers strong results under tight compute and memory budgets. We also present the first scaling law study for LDMs, revealing how data and model scaling jointly influence downstream performance and offering quantitative guidance for tabular foundation modeling. All LimiX models are publicly accessible under Apache 2.0.

Xingxuan Zhang, Gang Ren, Han Yu, Hao Yuan, Hui Wang, Jiansheng Li, Jiayun Wu, Lang Mo, Li Mao, Mingchao Hao, Ningbo Dai, Renzhe Xu, Shuyang Li, Tianyang Zhang, Yue He, Yuanrui Wang, Yunjia Zhang, Zijing Xu, Dongzhe Li, Fang Gao, Hao Zou, Jiandong Liu, Jiashuo Liu, Jiawei Xu, Kaijie Cheng, Kehan Li, Linjun Zhou, Qing Li, Shaohua Fan, Xiaoyu Lin, Xinyan Han, Xuanyue Li, Yan Lu, Yuan Xue, Yuanyuan Jiang, Zimu Wang, Zhenlei Wang, Peng Cui• 2025

Related benchmarks

TaskDatasetResultRank
User Clicks Predictionrel-avito
ROC-AUC60.29
84
User Engagement Predictionrel-stack
ROC-AUC82.95
69
Driver Top 3 Predictionrel-f1
ROC-AUC80.98
54
User Churn PredictionAmazon Rel
ROC-AUC0.6338
54
Item Churn Predictionrel-amazon
ROC-AUC77
54
Driver DNF Predictionrel-f1
ROC-AUC0.7069
54
Study Outcome Predictionrel (trial)
ROC-AUC0.5972
52
User Churn Predictionrel-hm
ROC-AUC63.37
52
User Ignore PredictionRel Event
ROC-AUC0.7948
50
User Repeat PredictionRel Event
ROC-AUC63.04
50
Showing 10 of 29 rows

Other info

Follow for update