Pure Transformers are Powerful Graph Learners
About
We show that standard Transformers without graph-specific modifications can lead to promising results in graph learning both in theory and practice. Given a graph, we simply treat all nodes and edges as independent tokens, augment them with token embeddings, and feed them to a Transformer. With an appropriate choice of token embeddings, we prove that this approach is theoretically at least as expressive as an invariant graph network (2-IGN) composed of equivariant linear layers, which is already more expressive than all message-passing Graph Neural Networks (GNN). When trained on a large-scale graph dataset (PCQM4Mv2), our method coined Tokenized Graph Transformer (TokenGT) achieves significantly better results compared to GNN baselines and competitive results compared to Transformer variants with sophisticated graph-specific inductive bias. Our implementation is available at https://github.com/jw9730/tokengt.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Node Classification | Chameleon | Accuracy38.1 | 549 | |
| Node Classification | Squirrel | Accuracy29.4 | 500 | |
| Graph Classification | NCI1 | Accuracy76.7 | 460 | |
| Graph Classification | IMDB-B | Accuracy80.2 | 322 | |
| Node Classification | Citeseer | Accuracy47 | 275 | |
| Graph Classification | NCI109 | Accuracy72.1 | 223 | |
| Graph Classification | IMDB-M | Accuracy47 | 218 | |
| Graph Regression | ZINC (test) | MAE0.047 | 204 | |
| Graph Classification | DD | Accuracy73.9 | 175 | |
| Graph Regression | OGB-LSC PCQM4M v2 (val) | MAE0.091 | 81 |