Learning Noise-Resilient and Transferable Graph-Text Alignment via Dynamic Quality Assessment

About

Pre-training Graph Foundation Models (GFMs) on text-attributed graphs (TAGs) is central to web-scale applications such as search, recommendation, and knowledge discovery. However, existing CLIP-style graph-text aligners face two key limitations: they assume strict one-to-one correspondences between nodes and texts, overlooking the inherent many-to-many relations in real-world graphs; and they rely on static alignment objectives that cannot adapt to varying data quality, making them brittle under noisy supervision. Together, these limitations expose a core dilemma: embracing expressive many-to-many alignment amplifies noise, while reverting to strict one-to-one strategies sacrifices semantic diversity and fails to handle inherently mismatched pairs. To address these challenges, we propose ADAligner, a dynamic, quality-aware graph-text alignment framework that dynamically adjusts between expressive many-to-many and conservative one-to-one objectives according to supervision quality. ADAligner estimates batch-level alignment reliability in real time and adapts its optimization accordingly, promoting soft, subgraph-level many-to-many alignment when supervision is clean, while emphasizing reliable one-to-one alignment by dynamically filtering low-confidence pairs under noise. Theoretically, we prove that this dynamic mechanism forms a stable negative feedback process, ensuring convergence and robustness. Comprehensive experiments on nine diverse TAG datasets demonstrate that ADAligner consistently outperforms prior graph-text aligners on zero-/few-shot node classification, link prediction and cross-modal retrieval tasks. It maintains strong robustness under noisy supervision and accelerates pre-training by approximately 2 to 3 times compared to multimodal baselines, establishing a scalable and reliable foundation for graph-text representation learning in real-world web environments.

Yuhang Liu, Minglai Shao, Zengyi Wo, Yunlong Chu, Bing Hao, Shengzhong Liu, Ruijie Wang, Jianxin Li• 2025

Related benchmarks

Task	Dataset	Result
Node Classification	Cora	Accuracy60.51	609
Node Classification	Instagram	Accuracy54.84	140
Link Prediction	Cora	AUC (Cora)77.44	94
Node Classification	Books-History	Accuracy51.94	29
Node Classification	Ele-Computers	Accuracy50.45	29
Node Classification	Ele-Photo	Accuracy41.06	26
Link Prediction	wikiCS	AUC73.33	25
Link Prediction	History	AUC70.45	19

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord