Toward General and Robust LLM-enhanced Text-attributed Graph Learning
About
Recent advancements in Large Language Models (LLMs) and the proliferation of Text-Attributed Graphs (TAGs) across various domains have positioned LLM-enhanced TAG learning as a critical research area. By utilizing rich graph descriptions, this paradigm leverages LLMs to generate high-quality embeddings, thereby enhancing the representational capacity of Graph Neural Networks (GNNs). However, the field faces significant challenges: (1) the absence of a unified framework to systematize the diverse optimization perspectives arising from the complex interactions between LLMs and GNNs, and (2) the lack of a robust method capable of handling real-world TAGs, which often suffer from texts and edge sparsity, leading to suboptimal performance. To address these challenges, we propose UltraTAG, a unified pipeline for LLM-enhanced TAG learning. UltraTAG provides a unified comprehensive and domain-adaptive framework that not only organizes existing methodologies but also paves the way for future advancements in the field. Building on this framework, we propose UltraTAG-S, a robust instantiation of UltraTAG designed to tackle the inherent sparsity issues in real-world TAGs. UltraTAG-S employs LLM-based text propagation and text augmentation to mitigate text sparsity, while leveraging LLM-augmented node selection techniques based on PageRank and edge reconfiguration strategies to address edge sparsity. Our extensive experiments demonstrate that UltraTAG-S significantly outperforms existing baselines, achieving improvements of 2.12\% and 17.47\% in ideal and sparse settings, respectively. Moreover, as the data sparsity ratio increases, the performance improvement of UltraTAG-S also rises, which underscores the effectiveness and robustness of UltraTAG-S.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Node Classification | Cora (test) | Mean Accuracy90.96 | 951 | |
| Node Classification | Cora | Accuracy88.34 | 583 | |
| Node Classification | Reddit (test) | Accuracy63.78 | 201 | |
| Node Classification | PubMed (test) | Accuracy92.41 | 162 | |
| Node Classification | Photo | Accuracy84.69 | 153 | |
| Node Classification | Wiki-CS (test) | Accuracy83.05 | 146 | |
| Node Classification | Citeseer | Accuracy (%)77.52 | 105 | |
| Node Classification | Instagram (test) | Accuracy66.69 | 39 | |
| Node Classification | Elo-Photo (test) | Accuracy84.7 | 39 |