TUDataset: A collection of benchmark datasets for learning with graphs

About

Recently, there has been an increasing interest in (supervised) learning with graph data, especially using graph neural networks. However, the development of meaningful benchmark datasets and standardized evaluation procedures is lagging, consequently hindering advancements in this area. To address this, we introduce the TUDataset for graph classification and regression. The collection consists of over 120 datasets of varying sizes from a wide range of applications. We provide Python-based data loaders, kernel and graph neural network baseline implementations, and evaluation tools. Here, we give an overview of the datasets, standardized evaluation procedures, and provide baseline experiments. All datasets are available at www.graphlearning.io. The experiments are fully reproducible from the code available at www.github.com/chrsmrrs/tudataset.

Christopher Morris, Nils M. Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, Marion Neumann• 2020

Related benchmarks

Task	Dataset	Result
Graph Classification	PROTEINS	Accuracy78.5	1383
Graph Classification	MUTAG	Accuracy85.8	1229
Graph Classification	NCI1	Accuracy87.5	707
Graph Classification	DD	Accuracy82	309
Graph Classification	NCI109	Accuracy85.9	275
Graph Classification	PROTEINS (10-fold cross-validation)	Accuracy73	223
Graph Classification	NCI1 (10-fold cross-validation)	Accuracy80	119
Graph Classification	ENZYMES (10-fold cross-validation)	Accuracy43.17	94
Graph Classification	Molhiv (scaffold)	ROC-AUC0.755	19
Graph Classification	ogbg-ppa (random species split)	Accuracy68.9	2

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord