Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RelPrism: A Multi-Faceted Pre-training Framework with Self-Generated Tasks for Relational Databases

About

Relational databases (RDBs) remain the cornerstone of modern data systems and support diverse predictive tasks. Recent relational deep learning (RDL) methods enable end-to-end prediction by converting RDBs into graphs, where rows are represented as nodes and inter-table interactions are represented as edges, and then applying graph-based models for representation learning. Despite the strong capability of RDL, effective self-supervised pre-training for RDBs remains non-trivial. RDB tasks often require multi-faceted information across different perspectives and granularities. For example, user churn classification may rely more on interaction patterns, whereas consumption value prediction requires both user-item behaviors and intrinsic user attributes for fine-grained regression. Such heterogeneous needs challenge RDB representation learning, as pre-training objectives should cover comprehensive information for downstream adaptation. However, existing SSL methods typically derive supervision from a single facet, such as node-level intrinsic attributes or subgraph-level relational structures, providing limited adaptability. To this end, we propose RelPrism, a multi-faceted self-supervised learning framework for RDBs. RelPrism constructs intrinsic, relational, and hybrid attributes from distinct perspectives, and applies multi-granularity clustering to each perspective to form corresponding pseudo-task pools. Pre-training over these pools exposes representations to broader perspectives and granularity levels, yielding a stronger basis for downstream adaptation. Experiments on 14 tasks across 5 real-world datasets show that RelPrism improves ROC-AUC by 4.15% for classification and reduces MAE by 10.75% for regression over state-of-the-art baselines. Our code is available at https://anonymous.4open.science/r/RelPrism.

Jinyu Yang, Cheng Yang, Junze Chen, Zedi Liu, Muhan Zhang, Hanyang Peng, Chuan Shi• 2026

Related benchmarks

TaskDatasetResultRank
Driver Top 3 Predictionrel-f1
ROC-AUC85
70
Driver DNF Predictionrel-f1
ROC-AUC0.756
67
User Churn PredictionAmazon Rel
ROC-AUC0.631
64
Item Churn Predictionrel-amazon
ROC-AUC73.1
64
User Churn Predictionrel-hm
ROC-AUC67.1
62
User Badge PredictionRel Stack User Badge
ROC-AUC88.4
37
Entity Regression (study-adverse)rel (trial)
MAE0.199
22
Entity Regression (post-votes)rel-stack
MAE0.12
19
Entity Classification (user-engagement)rel-stack
ROC-AUC90.6
17
Entity Regression (item-sales)rel-hm
MAE0.163
16
Showing 10 of 13 rows

Other info

Follow for update