Graph-Bert: Only Attention is Needed for Learning Graph Representations
About
The dominant graph neural networks (GNNs) over-rely on the graph links, several serious performance problems with which have been witnessed already, e.g., suspended animation problem and over-smoothing problem. What's more, the inherently inter-connected nature precludes parallelization within the graph, which becomes critical for large-sized graph, as memory constraints limit batching across the nodes. In this paper, we will introduce a new graph neural network, namely GRAPH-BERT (Graph based BERT), solely based on the attention mechanism without any graph convolution or aggregation operators. Instead of feeding GRAPH-BERT with the complete large input graph, we propose to train GRAPH-BERT with sampled linkless subgraphs within their local contexts. GRAPH-BERT can be learned effectively in a standalone mode. Meanwhile, a pre-trained GRAPH-BERT can also be transferred to other application tasks directly or with necessary fine-tuning if any supervised label information or certain application oriented objective is available. We have tested the effectiveness of GRAPH-BERT on several graph benchmark datasets. Based the pre-trained GRAPH-BERT with the node attribute reconstruction and structure recovery tasks, we further fine-tune GRAPH-BERT on node classification and graph clustering tasks specifically. The experimental results have demonstrated that GRAPH-BERT can out-perform the existing GNNs in both the learning effectiveness and efficiency.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Node Classification | Citeseer (test) | Accuracy0.712 | 729 | |
| Node Classification | Cora (test) | Mean Accuracy84.3 | 687 | |
| Node Classification | PubMed (test) | Accuracy79.3 | 500 | |
| Transductive Node Classification | Pubmed (transductive) | Accuracy79.3 | 95 | |
| Node Classification | Cora transductive (test) | Accuracy84.3 | 36 | |
| Node Classification | Citeseer transductive (test) | Accuracy71.2 | 28 |