IntraMix: Intra-Class Mixup Generation for Accurate Labels and Neighbors

About

Graph Neural Networks (GNNs) have shown great performance in various tasks, with the core idea of learning from data labels and aggregating messages within the neighborhood of nodes. However, the common challenges in graphs are twofold: insufficient accurate (high-quality) labels and limited neighbors for nodes, resulting in weak GNNs. Existing graph augmentation methods typically address only one of these challenges, often adding training costs or relying on oversimplified or knowledge-intensive strategies, limiting their generalization. To simultaneously address both challenges faced by graphs in a generalized way, we propose an elegant method called IntraMix. Considering the incompatibility of vanilla Mixup with the complex topology of graphs, IntraMix innovatively employs Mixup among inaccurate labeled data of the same class, generating high-quality labeled data at minimal cost. Additionally, it finds data with high confidence of being clustered into the same group as the generated data to serve as their neighbors, thereby enriching the neighborhoods of graphs. IntraMix efficiently tackles both issues faced by graphs and challenges the prior notion of the limited effectiveness of Mixup in node classification. IntraMix is a theoretically grounded plug-in-play method that can be readily applied to all GNNs. Extensive experiments demonstrate the effectiveness of IntraMix across various GNNs and datasets. Our code is available at: https://github.com/Zhengsh123/IntraMix.

Shenghe Zheng, Hongzhi Wang, Xianglong Liu• 2024

Related benchmarks

Task	Dataset	Result
Node Classification	Cora (semi-supervised)	Accuracy85.99	103
Node Classification	Cite semi-supervised	Accuracy75.25	61
Node Classification	PubMed semi-supervised	Accuracy82.98	42
Node Classification	CS semi-supervised	Accuracy93.24	30
Node Classification	Physics semi-supervised	Accuracy94.87	30
Node Classification	CORA inductive setting (test)	Accuracy83.8	22
Node Classification	CITESEER inductive setting (test)	Accuracy73.9	21
Semi-supervised node classification	Ogbn-arxiv	Accuracy0.7073	20
Node Classification	ogbn-arxiv semi-supervised 1% training size	Accuracy65.32	15
Node Classification	ogbn-arxiv full-supervised 100% training size	Accuracy73.85	15

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord