Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Distinguished In Uniform: Self Attention Vs. Virtual Nodes

About

Graph Transformers (GTs) such as SAN and GPS are graph processing models that combine Message-Passing GNNs (MPGNNs) with global Self-Attention. They were shown to be universal function approximators, with two reservations: 1. The initial node features must be augmented with certain positional encodings. 2. The approximation is non-uniform: Graphs of different sizes may require a different approximating network. We first clarify that this form of universality is not unique to GTs: Using the same positional encodings, also pure MPGNNs and even 2-layer MLPs are non-uniform universal approximators. We then consider uniform expressivity: The target function is to be approximated by a single network for graphs of all sizes. There, we compare GTs to the more efficient MPGNN + Virtual Node architecture. The essential difference between the two model definitions is in their global computation method -- Self-Attention Vs Virtual Node. We prove that none of the models is a uniform-universal approximator, before proving our main result: Neither model's uniform expressivity subsumes the other's. We demonstrate the theory with experiments on synthetic data. We further augment our study with real-world datasets, observing mixed results which indicate no clear ranking in practice as well.

Eran Rosenbluth, Jan T\"onshoff, Martin Ritzert, Berke Kisin, Martin Grohe• 2024

Related benchmarks

TaskDatasetResultRank
Graph RegressionPeptides struct LRGB (test)
MAE0.2458
238
Graph Classificationogbg-molpcba (test)
AP29.07
212
Graph ClassificationPeptides-func LRGB (test)
AP0.6822
196
Superpixel classificationPascalVOC-SP LRGB (test)
F1 Score44.4
9
Superpixel classificationCOCO-SP LRGB (test)
F1 Score38.84
8
Showing 5 of 5 rows

Other info

Follow for update