Adversarial Graph Neural Network Benchmarks: Towards Practical and Fair Evaluation

About

Adversarial learning and the robustness of Graph Neural Networks (GNNs) are topics of widespread interest in the machine learning community, as documented by the number of adversarial attacks and defenses designed for these purposes. While a rigorous evaluation of these adversarial methods is necessary to understand the robustness of GNNs in real-world applications, we posit that many works in the literature do not share the same experimental settings, leading to ambiguous and potentially contradictory scientific conclusions. In this benchmark, we demonstrate the importance of adopting fair, robust, and standardized evaluation protocols in adversarial GNN research. We perform a comprehensive re-evaluation of seven widely used attacks and eight recent defenses under both poisoning and evasion scenarios, across six popular graph datasets. Our study spans over 453,000 experiments conducted within a unified framework. We observe substantial differences in adversarial attack performance when evaluated under a fair and robust procedure. Our findings reveal that previously overlooked factors, such as target node selection and the training process of the attacked model, have a profound impact on attack effectiveness, to the extent of completely distorting performance insights. These results underscore the urgent need for standardized evaluations in adversarial graph machine learning.

Tran Gia Bao Ngo, Zulfikar Alom, Federico Errica, Murat Kantarcioglu, Cuneyt Gurcan Akcora• 2026

Related benchmarks

Task	Dataset	Result
Node Classification	SQUIRREL vanilla (test)	Miss-classification Rate7.87	322
Node Classification	Cora	Miss-classification Rate (Δ=1)13.2	96
Node Classification	Pubmed	Miss-classification Rate (Δ=1)9.73	96
Node Classification	Cora	Miss-classification Rate28.4	70
Node Classification	Pubmed	Miss-classification Rate25.73	70
Node Classification	OGB-ARXIV (test)	Misclassification Rate16	60
Node Classification	CHAMELEON vanilla (test)	Miss-classification Rate (Δ=1)2.27	50
Node Classification	Citeseer	--	40
Node Classification	Chameleon	Average Rank1	34
Node Classification	Cora (test)	Error Rate (Δ=1)6.27	28

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord