Unlocking Multi-Modal Potentials for Link Prediction on Dynamic Text-Attributed Graphs
About
Dynamic Text-Attributed Graphs (DyTAGs) are a novel graph paradigm that captures evolving temporal events (edges) alongside rich textual attributes. Existing studies can be broadly categorized into TGNN-driven and LLM-driven approaches, both of which encode textual attributes and temporal structures for DyTAG representation. We observe that DyTAGs inherently comprise three distinct modalities: temporal, textual, and structural, often exhibiting completely disjoint distributions. However, the first two modalities are largely overlooked by existing studies, leading to suboptimal performance. To address this, we propose MoMent, a multi-modal model that explicitly models, integrates, and aligns each modality to learn node representations for link prediction. Given the disjoint nature of the original modality distributions, we first construct modality-specific features and encode them using individual encoders to capture correlations across temporal patterns, semantic context, and local structures. Each encoder generates modality-specific tokens, which are then fused into comprehensive node representations with a theoretical guarantee. To avoid disjoint subspaces of these heterogeneous modalities, we propose a dual-domain alignment loss that first aligns their distributions globally and then fine-tunes coherence at the instance level. This enhances coherent representations from temporal, textual, and structural views. Extensive experiments across seven datasets show that MoMent achieves up to 17.28% accuracy improvement and up to 31x speed-up against eight baselines.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Dynamic Link Prediction | Enron (inductive) | AUC-ROC88.52 | 39 | |
| Temporal Link Prediction | ICEWS1819 transductive | ROC-AUC0.9897 | 17 | |
| Temporal Link Prediction | ICEWS inductive 1819 | ROC-AUC96.57 | 17 | |
| Temporal Link Prediction | Googlemap CT transductive | ROC-AUC0.814 | 15 | |
| Temporal Link Prediction | Googlemap CT inductive | ROC-AUC (%)75.22 | 15 | |
| Dynamic Link Prediction | Enron (transductive) | AUC-ROC0.9696 | 12 | |
| Destination Node Retrieval | GDELT (transductive) | Hits@367.59 | 9 | |
| Dynamic Link Prediction | GDELT (transductive) | AUC-ROC96.21 | 9 | |
| Destination Node Retrieval | GDELT (transductive) | Hits@143.1 | 9 | |
| Dynamic Link Prediction | ICEWS 1819 (inductive) | AP96.63 | 9 |