Distinguished In Uniform: Self Attention Vs. Virtual Nodes

About

Graph Transformers (GTs) such as SAN and GPS are graph processing models that combine Message-Passing GNNs (MPGNNs) with global Self-Attention. They were shown to be universal function approximators, with two reservations: 1. The initial node features must be augmented with certain positional encodings. 2. The approximation is non-uniform: Graphs of different sizes may require a different approximating network. We first clarify that this form of universality is not unique to GTs: Using the same positional encodings, also pure MPGNNs and even 2-layer MLPs are non-uniform universal approximators. We then consider uniform expressivity: The target function is to be approximated by a single network for graphs of all sizes. There, we compare GTs to the more efficient MPGNN + Virtual Node architecture. The essential difference between the two model definitions is in their global computation method -- Self-Attention Vs Virtual Node. We prove that none of the models is a uniform-universal approximator, before proving our main result: Neither model's uniform expressivity subsumes the other's. We demonstrate the theory with experiments on synthetic data. We further augment our study with real-world datasets, observing mixed results which indicate no clear ranking in practice as well.

Eran Rosenbluth, Jan T\"onshoff, Martin Ritzert, Berke Kisin, Martin Grohe• 2024

Related benchmarks

Task	Dataset	Result
Graph Regression	Peptides struct LRGB (test)	MAE0.2458	238
Graph Classification	ogbg-molpcba (test)	AP29.07	212
Graph Classification	Peptides-func LRGB (test)	AP0.6822	196
Superpixel classification	PascalVOC-SP LRGB (test)	F1 Score44.4	9
Superpixel classification	COCO-SP LRGB (test)	F1 Score38.84	8

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord