Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Distinguished In Uniform: Self Attention Vs. Virtual Nodes

About

Graph Transformers (GTs) such as SAN and GPS are graph processing models that combine Message-Passing GNNs (MPGNNs) with global Self-Attention. They were shown to be universal function approximators, with two reservations: 1. The initial node features must be augmented with certain positional encodings. 2. The approximation is non-uniform: Graphs of different sizes may require a different approximating network. We first clarify that this form of universality is not unique to GTs: Using the same positional encodings, also pure MPGNNs and even 2-layer MLPs are non-uniform universal approximators. We then consider uniform expressivity: The target function is to be approximated by a single network for graphs of all sizes. There, we compare GTs to the more efficient MPGNN + Virtual Node architecture. The essential difference between the two model definitions is in their global computation method -- Self-Attention Vs Virtual Node. We prove that none of the models is a uniform-universal approximator, before proving our main result: Neither model's uniform expressivity subsumes the other's. We demonstrate the theory with experiments on synthetic data. We further augment our study with real-world datasets, observing mixed results which indicate no clear ranking in practice as well.

Eran Rosenbluth, Jan T\"onshoff, Martin Ritzert, Berke Kisin, Martin Grohe• 2024

Related benchmarks

TaskDatasetResultRank
Graph Classificationogbg-molpcba (test)
AP29.07
206
Graph RegressionPeptides struct LRGB (test)
MAE0.2458
178
Graph ClassificationPeptides-func LRGB (test)
AP0.6822
136
Superpixel classificationPascalVOC-SP LRGB (test)
F1 Score44.4
9
Superpixel classificationCOCO-SP LRGB (test)
F1 Score38.84
8
Showing 5 of 5 rows

Other info

Follow for update