GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings
About
We present GNNAutoScale (GAS), a framework for scaling arbitrary message-passing GNNs to large graphs. GAS prunes entire sub-trees of the computation graph by utilizing historical embeddings from prior training iterations, leading to constant GPU memory consumption in respect to input node size without dropping any data. While existing solutions weaken the expressive power of message passing due to sub-sampling of edges or non-trainable propagations, our approach is provably able to maintain the expressive power of the original GNN. We achieve this by providing approximation error bounds of historical embeddings and show how to tighten them in practice. Empirically, we show that the practical realization of our framework, PyGAS, an easy-to-use extension for PyTorch Geometric, is both fast and memory-efficient, learns expressive node representations, closely resembles the performance of their non-scaling counterparts, and reaches state-of-the-art performance on large-scale graphs.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Node Classification | Cora | Macro-F143.45 | 30 | |
| Node Classification | Citeseer | F1 Score39.72 | 27 | |
| AML Node Classification | Synthetic AML HI-Small | Average F1 Score54.36 | 12 | |
| AML Node Classification | Synthetic AML HI-Medium | Average F156.12 | 12 | |
| AML Node Classification | Synthetic AML LI-Small | Average F1 Score16.14 | 12 | |
| AML Node Classification | Synthetic AML LI-Medium | Avg F1 Score0.1129 | 12 | |
| AML Node Classification | Synthetic AML LI-Large | Average F1 Score0.00e+0 | 12 | |
| AML Node Classification | Synthetic AML HI-Large | Average F1 Score52.1 | 12 | |
| Node Classification | Pubmed | Average F163.47 | 8 | |
| Node Classification | MSAcademic | Average F181.68 | 8 |