Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

About

Many widely used datasets for graph machine learning tasks have generally been homophilous, where nodes with similar labels connect to each other. Recently, new Graph Neural Networks (GNNs) have been developed that move beyond the homophily regime; however, their evaluation has often been conducted on small graphs with limited application domains. We collect and introduce diverse non-homophilous datasets from a variety of application areas that have up to 384x more nodes and 1398x more edges than prior datasets. We further show that existing scalable graph learning and graph minibatching techniques lead to performance degradation on these non-homophilous datasets, thus highlighting the need for further work on scalable non-homophilous methods. To address these concerns, we introduce LINKX -- a strong simple method that admits straightforward minibatch training and inference. Extensive experimental results with representative simple methods and GNNs across our proposed datasets show that LINKX achieves state-of-the-art performance for learning on non-homophilous graphs. Our codes and data are available at https://github.com/CUAI/Non-Homophily-Large-Scale.

Derek Lim, Felix Hohne, Xiuyu Li, Sijia Linda Huang, Vaishnavi Gupta, Omkar Bhalerao, Ser-Nam Lim• 2021

Related benchmarks

TaskDatasetResultRank
Node ClassificationCora
Accuracy87.86
1215
Node ClassificationCiteseer
Accuracy73.19
1037
Node ClassificationCora (test)
Mean Accuracy62.66
951
Node ClassificationCiteseer (test)
Accuracy0.5366
945
Node ClassificationChameleon
Accuracy71.14
867
Node ClassificationPubmed
Accuracy87.86
865
Node ClassificationWisconsin
Accuracy75.5
864
Node ClassificationCornell
Accuracy77.84
851
Node ClassificationTexas
Accuracy74.6
801
Node ClassificationSquirrel
Accuracy61.81
786
Showing 10 of 140 rows
...

Other info

Code

Follow for update