Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Adversarial Graph Neural Network Benchmarks: Towards Practical and Fair Evaluation

About

Adversarial learning and the robustness of Graph Neural Networks (GNNs) are topics of widespread interest in the machine learning community, as documented by the number of adversarial attacks and defenses designed for these purposes. While a rigorous evaluation of these adversarial methods is necessary to understand the robustness of GNNs in real-world applications, we posit that many works in the literature do not share the same experimental settings, leading to ambiguous and potentially contradictory scientific conclusions. In this benchmark, we demonstrate the importance of adopting fair, robust, and standardized evaluation protocols in adversarial GNN research. We perform a comprehensive re-evaluation of seven widely used attacks and eight recent defenses under both poisoning and evasion scenarios, across six popular graph datasets. Our study spans over 453,000 experiments conducted within a unified framework. We observe substantial differences in adversarial attack performance when evaluated under a fair and robust procedure. Our findings reveal that previously overlooked factors, such as target node selection and the training process of the attacked model, have a profound impact on attack effectiveness, to the extent of completely distorting performance insights. These results underscore the urgent need for standardized evaluations in adversarial graph machine learning.

Tran Gia Bao Ngo, Zulfikar Alom, Federico Errica, Murat Kantarcioglu, Cuneyt Gurcan Akcora• 2026

Related benchmarks

TaskDatasetResultRank
Node ClassificationSQUIRREL vanilla (test)
Miss-classification Rate7.87
322
Node ClassificationCora
Miss-classification Rate (Δ=1)13.2
96
Node ClassificationPubmed
Miss-classification Rate (Δ=1)9.73
96
Node ClassificationCora
Miss-classification Rate28.4
70
Node ClassificationPubmed
Miss-classification Rate25.73
70
Node ClassificationOGB-ARXIV (test)
Misclassification Rate16
60
Node ClassificationCHAMELEON vanilla (test)
Miss-classification Rate (Δ=1)2.27
50
Node ClassificationCiteseer--
40
Node ClassificationChameleon
Average Rank1
34
Node ClassificationCora (test)
Error Rate (Δ=1)6.27
28
Showing 10 of 18 rows

Other info

Follow for update