Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Hierarchical Graph Transformer with Adaptive Node Sampling

About

The Transformer architecture has achieved remarkable success in a number of domains including natural language processing and computer vision. However, when it comes to graph-structured data, transformers have not achieved competitive performance, especially on large graphs. In this paper, we identify the main deficiencies of current graph transformers:(1) Existing node sampling strategies in Graph Transformers are agnostic to the graph characteristics and the training process. (2) Most sampling strategies only focus on local neighbors and neglect the long-range dependencies in the graph. We conduct experimental investigations on synthetic datasets to show that existing sampling strategies are sub-optimal. To tackle the aforementioned problems, we formulate the optimization strategies of node sampling in Graph Transformer as an adversary bandit problem, where the rewards are related to the attention weights and can vary in the training procedure. Meanwhile, we propose a hierarchical attention scheme with graph coarsening to capture the long-range interactions while reducing computational complexity. Finally, we conduct extensive experiments on real-world datasets to demonstrate the superiority of our method over existing graph transformers and popular GNNs.

Zaixi Zhang, Qi Liu, Qingyong Hu, Chee-Kong Lee• 2022

Related benchmarks

TaskDatasetResultRank
Node ClassificationCora
Accuracy88
1215
Node ClassificationCiteseer
Accuracy75
931
Node ClassificationCora (test)
Mean Accuracy88.69
861
Node ClassificationCiteseer (test)
Accuracy0.8013
824
Node ClassificationChameleon
Accuracy54.6
640
Node ClassificationWisconsin
Accuracy85.2
627
Node ClassificationTexas
Accuracy0.638
616
Node ClassificationSquirrel
Accuracy40.9
591
Node ClassificationCornell
Accuracy59.8
582
Node ClassificationPubMed (test)
Accuracy89.75
546
Showing 10 of 35 rows

Other info

Follow for update