Operational Feature Fingerprints of Graph Datasets via a White-Box Signal-Subspace Probe

About

Graph neural networks achieve strong node-classification accuracy, but learned message passing entangles ego attributes, neighborhood smoothing, high-pass graph differences, class geometry, and classifier-boundary effects inside opaque representations. This obscures why nodes are classified as they are and which graph-learning mechanisms a dataset requires. We propose WG-SRC, a white-box signal-subspace probe for prediction and graph dataset diagnosis. WG-SRC replaces learned message passing with a fixed, named graph-signal dictionary containing raw features, row- and symmetric-normalized low-pass propagation, and high-pass graph differences. It combines Fisher coordinate selection, class-wise PCA subspaces, closed-form multi-alpha ridge classification, and validation-based score fusion, so prediction and analysis rely on explicit class subspaces, energy-controlled dimensions, and closed-form linear decisions. As a white-box graph-learning instrument, WG-SRC uses predictive performance to validate its diagnostics. Across six node-classification datasets, it remains competitive with reproduced baselines and achieves positive average gain under aligned splits. Its atlas decomposes behavior into raw-feature, low-pass, high-pass, class-geometric, and ridge-boundary components. The resulting fingerprints distinguish low-pass-dominated Amazon graphs, mixed high-pass and class-geometrically complex Chameleon behavior, and raw- or boundary-sensitive WebKB graphs. Aligned interventions show when high-pass blocks act as removable noise, when raw or graph-derived signals should be preserved, and when ridge correction matters. WG-SRC therefore serves both as a functioning white-box classifier and as a dataset-fingerprinting probe, enabling fingerprint-conditioned analysis of how black-box model components behave under different graph-signal conditions.

Yuchen Xiong, Swee Keong Yeap, Zhen Hong Ban• 2026

Related benchmarks

Task	Dataset	Result
Node Classification	Chameleon (test)	Mean Accuracy72.48	425
Node Classification	Cornell (test)	Mean Accuracy75.41	403
Node Classification	Texas (test)	Mean Accuracy86.32	402
Node Classification	Wisconsin (test)	Mean Accuracy84.31	346
Node Classification	Amazon Photo (test)	Accuracy88.76	112
Node Classification	Amazon Computer (test)	Accuracy78.71	104

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord