RelAgent: LLM Agents as Data Scientists for Relational Learning

About

Relational learning is a challenging problem that has motivated a wide range of approaches, including graph-based models (e.g., graph neural networks, graph transformers), tabular methods (e.g., tabular foundation models), and sequence-based approaches (e.g., large language models), each with its own advantages and limitations. We propose RelAgent, an LLM-based autonomous data scientist for relational learning, which operates in two phases. In the search phase, an LLM agent uses database, validation, and evaluation workspace tools to construct SQL feature programs and select a predictive model. In the inference phase, the resulting program is executed without further LLM calls. The final predictor consists of SQL queries and a classical model, enabling fast, deterministic, and intrinsically interpretable predictions: features are human-readable queries, and predictions depend only on the resulting query-defined feature map, enabling scalable deployment using standard database systems.

Xingyue Huang, Louis Tichelman, Jinwoo Kim, Krzysztof Olejniczak, \.Ismail \.Ilkan Ceylan• 2026

Related benchmarks

Task	Dataset	Result
Entity Regression	RelBench v1.0 (test)	CTR (Avito Ad)3.3	45
Entity Regression	RelBench V1	Avito CTR Error (MAE)0.033	26
Regression	RelBench v2 (test)	MAE (RateBeer User-Count)6.021	21
Binary Classification	4DBInfer (test)	AUROC (AB Churn)0.7944	20
Binary Classification	RelBench V1	Avito Click AUROC68.36	20
Entity Classification	4DBInfer (test)	Churn Rate (AB)79.44	12
Binary Classification	Relbench v2	RateBeer Beer Churn Score84.7	8
Entity Classification	RelBench v2 (test)	RateBeer Beer AUROC84.7	5

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord