$P^2$GNN: Two Prototype Sets to boost GNN Performance
About
Message Passing Graph Neural Networks (MP-GNNs) have garnered attention for addressing various industry challenges, such as user recommendation and fraud detection. However, they face two major hurdles: (1) heavy reliance on local context, often lacking information about the global context or graph-level features, and (2) assumption of strong homophily among connected nodes, struggling with noisy local neighborhoods. To tackle these, we introduce $P^2$GNN, a plug-and-play technique leveraging prototypes to optimize message passing, enhancing the performance of the base GNN model. Our approach views the prototypes in two ways: (1) as universally accessible neighbors for all nodes, enriching global context, and (2) aligning messages to clustered prototypes, offering a denoising effect. We demonstrate the extensibility of our proposed method to all message-passing GNNs and conduct extensive experiments across 18 datasets, including proprietary e-commerce datasets and open-source datasets, on node recommendation and node classification tasks. Results show that $P^2$GNN outperforms production models in e-commerce and achieves the top average rank on open-source datasets, establishing it as a leading approach. Qualitative analysis supports the value of global context and noise mitigation in the local neighborhood in enhancing performance.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Node Classification | arXiv-year | Accuracy54.88 | 112 | |
| Node Classification | Cornell (60%/20%/20% random) | Accuracy95.41 | 95 | |
| Node Classification | Cora (60/20/20 random split) | Accuracy89.89 | 91 | |
| Node Classification | Texas (48/32/20) | Mean Accuracy88.65 | 78 | |
| Node Classification | Chameleon (60%/20%/20% random) | Accuracy69.87 | 72 | |
| Node Classification | Wisconsin (48/32/20) | Mean Accuracy88.43 | 66 | |
| Node Classification | Cornell (48/32/20) | Mean Accuracy86.49 | 66 | |
| Node Classification | Citeseer (48/32/20) | Mean Accuracy (%)77.51 | 66 | |
| Node Classification | Texas (60% 20% 20% random splits) | Accuracy96.72 | 62 | |
| Node Classification | Chameleon (48/32/20) | Mean Accuracy78.6 | 49 |