Vision Graph Prompting via Semantic Low-Rank Decomposition
About
Vision GNN (ViG) demonstrates superior performance by representing images as graph structures, providing a more natural way to capture irregular semantic patterns beyond traditional grid or sequence-based representations. To efficiently adapt ViG to downstream tasks, parameter-efficient fine-tuning techniques like visual prompting become increasingly essential. However, existing prompting methods are primarily designed for Transformer-based models, neglecting the rich topological relationships among nodes and edges in graph-based representations, limiting their capacity to model complex semantics. In this paper, we propose Vision Graph Prompting (VGP), a novel framework tailored for vision graph structures. Our core insight reveals that semantically connected components in the graph exhibit low-rank properties. Building on this observation, we introduce a semantic low-rank prompting method that decomposes low-rank semantic features and integrates them with prompts on vision graph topologies, capturing both global structural patterns and fine-grained semantic dependencies. Extensive experiments demonstrate our method significantly improves ViG's transfer performance on diverse downstream tasks, achieving results comparable to full fine-tuning while maintaining parameter efficiency. Our code is available at https://github.com/zhoujiahuan1991/ICML2025-VGP.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Graph property prediction | Tox21 | ROC-AUC0.8008 | 109 | |
| Graph property prediction | ClinTox | ROC-AUC74.82 | 102 | |
| Graph property prediction | BACE | ROC AUC83.62 | 101 | |
| Graph property prediction | ToxCast | ROC-AUC0.6825 | 95 | |
| Graph property prediction | MUV | ROC-AUC0.8264 | 95 | |
| Graph property prediction | SIDER | ROC AUC67.12 | 95 | |
| Graph Classification | BBBP | ROC-AUC70.17 | 8 |