DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion
About
Real-world data generation often involves complex inter-dependencies among instances, violating the IID-data hypothesis of standard learning paradigms and posing a challenge for uncovering the geometric structures for learning desired instance representations. To this end, we introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states that progressively incorporate other instances' information by their interactions. The diffusion process is constrained by descent criteria w.r.t.~a principled energy function that characterizes the global consistency of instance representations over latent structures. We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs, which gives rise to a new class of neural encoders, dubbed as DIFFormer (diffusion-based Transformers), with two instantiations: a simple version with linear complexity for prohibitive instance numbers, and an advanced version for learning complex structures. Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks, such as node classification on large graphs, semi-supervised image/text classification, and spatial-temporal dynamics prediction.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Node Classification | Pubmed | Accuracy81.6 | 865 | |
| Node Classification | PubMed (test) | Accuracy36.1 | 586 | |
| Node Classification | ogbn-arxiv (test) | Accuracy72.21 | 497 | |
| Node Classification | Photo | Mean Accuracy95.1 | 374 | |
| Node Classification | Chameleon (test) | Mean Accuracy44.44 | 335 | |
| Node Classification | wikiCS | Accuracy73.46 | 329 | |
| Node Classification | Roman-Empire | Accuracy79.1 | 327 | |
| Node Classification | amazon-ratings | Accuracy47.84 | 309 | |
| Node Classification | Actor (test) | Mean Accuracy0.3673 | 286 | |
| Node Classification | Accuracy55.3 | 216 |