Inductive Representation Learning on Large Graphs

About

Low-dimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in the graph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node's local neighborhood. Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions.

William L. Hamilton, Rex Ying, Jure Leskovec• 2017

Related benchmarks

Task	Dataset	Result
Graph Classification	PROTEINS	Accuracy76.3	1252
Node Classification	Cora	Accuracy87.77	1215
Graph Classification	MUTAG	Accuracy86.35	1103
Node Classification	Citeseer	Accuracy77.27	1037
Node Classification	Cora (test)	Mean Accuracy86.9	951
Node Classification	Citeseer (test)	Accuracy0.7734	945
Node Classification	Chameleon	Accuracy67.92	867
Node Classification	Pubmed	Accuracy90.54	865
Node Classification	Wisconsin	Accuracy81.6	864
Node Classification	Cornell	Accuracy81.18	851

Showing 10 of 1144 rows

...

Other info

Code

Follow for update

@wizwand_team Discord