Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

A critical look at the evaluation of GNNs under heterophily: Are we really making progress?

About

Node classification is a classical graph machine learning task on which Graph Neural Networks (GNNs) have recently achieved strong results. However, it is often believed that standard GNNs only work well for homophilous graphs, i.e., graphs where edges tend to connect nodes of the same class. Graphs without this property are called heterophilous, and it is typically assumed that specialized methods are required to achieve strong performance on such graphs. In this work, we challenge this assumption. First, we show that the standard datasets used for evaluating heterophily-specific models have serious drawbacks, making results obtained by using them unreliable. The most significant of these drawbacks is the presence of a large number of duplicate nodes in the datasets Squirrel and Chameleon, which leads to train-test data leakage. We show that removing duplicate nodes strongly affects GNN performance on these datasets. Then, we propose a set of heterophilous graphs of varying properties that we believe can serve as a better benchmark for evaluating the performance of GNNs under heterophily. We show that standard GNNs achieve strong results on these heterophilous graphs, almost always outperforming specialized models. Our datasets and the code for reproducing our experiments are available at https://github.com/yandex-research/heterophilous-graphs

Oleg Platonov, Denis Kuznedelev, Michael Diskin, Artem Babenko, Liudmila Prokhorenkova• 2023

Related benchmarks

TaskDatasetResultRank
Node ClassificationChameleon
Accuracy77.85
549
Node ClassificationSquirrel
Accuracy68.93
500
Node ClassificationPubmed
Accuracy78.7
307
Node ClassificationCiteseer
Accuracy71.9
275
Node ClassificationActor
Accuracy34.61
237
Graph RegressionPeptides struct LRGB (test)
MAE0.246
178
Node ClassificationAmazon Photo
Accuracy91.4
150
Node Classificationamazon-ratings
Accuracy52.7
138
Graph ClassificationPeptides-func LRGB (test)
AP0.686
136
Node ClassificationRoman-Empire
Accuracy88.75
135
Showing 10 of 47 rows

Other info

Follow for update