Inductive Representation Learning on Large Graphs
About
Low-dimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in the graph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node's local neighborhood. Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Node Classification | Cora | Accuracy87.77 | 885 | |
| Node Classification | Citeseer | Accuracy77.27 | 804 | |
| Graph Classification | PROTEINS | Accuracy76.2 | 742 | |
| Node Classification | Pubmed | Accuracy88.5 | 742 | |
| Node Classification | Citeseer (test) | Accuracy0.7734 | 729 | |
| Graph Classification | MUTAG | Accuracy86.35 | 697 | |
| Node Classification | Cora (test) | Mean Accuracy86.9 | 687 | |
| Node Classification | Chameleon | Accuracy67.92 | 549 | |
| Node Classification | PubMed (test) | Accuracy90.02 | 500 | |
| Node Classification | Squirrel | Accuracy47.42 | 500 |