VecFormer: Towards Efficient and Generalizable Graph Transformer with Graph Token Attention
About
Graph Transformer has demonstrated impressive capabilities in the field of graph representation learning. However, existing approaches face two critical challenges: (1) most models suffer from exponentially increasing computational complexity, making it difficult to scale to large graphs; (2) attention mechanisms based on node-level operations limit the flexibility of the model and result in poor generalization performance in out-of-distribution (OOD) scenarios. To address these issues, we propose \textbf{VecFormer} (the \textbf{Vec}tor Quantized Graph Trans\textbf{former}), an efficient and highly generalizable model for node classification, particularly under OOD settings. VecFormer adopts a two-stage training paradigm. In the first stage, two codebooks are used to reconstruct the node features and the graph structure, aiming to learn the rich semantic \texttt{Graph Codes}. In the second stage, attention mechanisms are performed at the \texttt{Graph Token} level based on the transformed cross codebook, reducing computational complexity while enhancing the model's generalization capability. Extensive experiments on datasets of various sizes demonstrate that VecFormer outperforms the existing Graph Transformer in both performance and speed.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Node Classification | Pubmed | Accuracy90.47 | 742 | |
| Node Classification | Physics | Accuracy97.17 | 145 | |
| Node Classification | CS | Accuracy95.54 | 128 | |
| Node Classification | Cora Full | Accuracy72.14 | 88 | |
| Node Classification | pokec (test) | Accuracy78.06 | 66 | |
| Node Classification | Computer | Accuracy92.51 | 48 | |
| Node Classification | Twitch (OOD) | AUROC68.14 | 36 | |
| Node Classification | Photo | Accuracy95.84 | 23 | |
| Node Classification | Cora (ID) | Accuracy97.91 | 21 | |
| Node Classification | Cora (OOD) | Accuracy98.85 | 21 |