RelPrism: A Multi-Faceted Pre-training Framework with Self-Generated Tasks for Relational Databases
About
Relational databases (RDBs) remain the cornerstone of modern data systems and support diverse predictive tasks. Recent relational deep learning (RDL) methods enable end-to-end prediction by converting RDBs into graphs, where rows are represented as nodes and inter-table interactions are represented as edges, and then applying graph-based models for representation learning. Despite the strong capability of RDL, effective self-supervised pre-training for RDBs remains non-trivial. RDB tasks often require multi-faceted information across different perspectives and granularities. For example, user churn classification may rely more on interaction patterns, whereas consumption value prediction requires both user-item behaviors and intrinsic user attributes for fine-grained regression. Such heterogeneous needs challenge RDB representation learning, as pre-training objectives should cover comprehensive information for downstream adaptation. However, existing SSL methods typically derive supervision from a single facet, such as node-level intrinsic attributes or subgraph-level relational structures, providing limited adaptability. To this end, we propose RelPrism, a multi-faceted self-supervised learning framework for RDBs. RelPrism constructs intrinsic, relational, and hybrid attributes from distinct perspectives, and applies multi-granularity clustering to each perspective to form corresponding pseudo-task pools. Pre-training over these pools exposes representations to broader perspectives and granularity levels, yielding a stronger basis for downstream adaptation. Experiments on 14 tasks across 5 real-world datasets show that RelPrism improves ROC-AUC by 4.15% for classification and reduces MAE by 10.75% for regression over state-of-the-art baselines. Our code is available at https://anonymous.4open.science/r/RelPrism.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Driver Top 3 Prediction | rel-f1 | ROC-AUC85 | 70 | |
| Driver DNF Prediction | rel-f1 | ROC-AUC0.756 | 67 | |
| User Churn Prediction | Amazon Rel | ROC-AUC0.631 | 64 | |
| Item Churn Prediction | rel-amazon | ROC-AUC73.1 | 64 | |
| User Churn Prediction | rel-hm | ROC-AUC67.1 | 62 | |
| User Badge Prediction | Rel Stack User Badge | ROC-AUC88.4 | 37 | |
| Entity Regression (study-adverse) | rel (trial) | MAE0.199 | 22 | |
| Entity Regression (post-votes) | rel-stack | MAE0.12 | 19 | |
| Entity Classification (user-engagement) | rel-stack | ROC-AUC90.6 | 17 | |
| Entity Regression (item-sales) | rel-hm | MAE0.163 | 16 |