RelPrism: A Multi-Faceted Pre-training Framework with Self-Generated Tasks for Relational Databases

About

Relational databases (RDBs) remain the cornerstone of modern data systems and support diverse predictive tasks. Recent relational deep learning (RDL) methods enable end-to-end prediction by converting RDBs into graphs, where rows are represented as nodes and inter-table interactions are represented as edges, and then applying graph-based models for representation learning. Despite the strong capability of RDL, effective self-supervised pre-training for RDBs remains non-trivial. RDB tasks often require multi-faceted information across different perspectives and granularities. For example, user churn classification may rely more on interaction patterns, whereas consumption value prediction requires both user-item behaviors and intrinsic user attributes for fine-grained regression. Such heterogeneous needs challenge RDB representation learning, as pre-training objectives should cover comprehensive information for downstream adaptation. However, existing SSL methods typically derive supervision from a single facet, such as node-level intrinsic attributes or subgraph-level relational structures, providing limited adaptability. To this end, we propose RelPrism, a multi-faceted self-supervised learning framework for RDBs. RelPrism constructs intrinsic, relational, and hybrid attributes from distinct perspectives, and applies multi-granularity clustering to each perspective to form corresponding pseudo-task pools. Pre-training over these pools exposes representations to broader perspectives and granularity levels, yielding a stronger basis for downstream adaptation. Experiments on 14 tasks across 5 real-world datasets show that RelPrism improves ROC-AUC by 4.15% for classification and reduces MAE by 10.75% for regression over state-of-the-art baselines. Our code is available at https://anonymous.4open.science/r/RelPrism.

Jinyu Yang, Cheng Yang, Junze Chen, Zedi Liu, Muhan Zhang, Hanyang Peng, Chuan Shi• 2026

Related benchmarks

Task	Dataset	Result
Driver Top 3 Prediction	rel-f1	ROC-AUC85	70
Driver DNF Prediction	rel-f1	ROC-AUC0.756	67
User Churn Prediction	Amazon Rel	ROC-AUC0.631	64
Item Churn Prediction	rel-amazon	ROC-AUC73.1	64
User Churn Prediction	rel-hm	ROC-AUC67.1	62
User Badge Prediction	Rel Stack User Badge	ROC-AUC88.4	37
Entity Regression (study-adverse)	rel (trial)	MAE0.199	31
Entity Regression (post-votes)	rel-stack	MAE0.12	19
Entity Classification (user-engagement)	rel-stack	ROC-AUC90.6	17
Entity Regression (item-sales)	rel-hm	MAE0.163	16

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord