Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Operational Feature Fingerprints of Graph Datasets via a White-Box Signal-Subspace Probe

About

Graph neural networks achieve strong node-classification accuracy, but learned message passing entangles ego attributes, neighborhood smoothing, high-pass graph differences, class geometry, and classifier-boundary effects inside opaque representations. This obscures why nodes are classified as they are and which graph-learning mechanisms a dataset requires. We propose WG-SRC, a white-box signal-subspace probe for prediction and graph dataset diagnosis. WG-SRC replaces learned message passing with a fixed, named graph-signal dictionary containing raw features, row- and symmetric-normalized low-pass propagation, and high-pass graph differences. It combines Fisher coordinate selection, class-wise PCA subspaces, closed-form multi-alpha ridge classification, and validation-based score fusion, so prediction and analysis rely on explicit class subspaces, energy-controlled dimensions, and closed-form linear decisions. As a white-box graph-learning instrument, WG-SRC uses predictive performance to validate its diagnostics. Across six node-classification datasets, it remains competitive with reproduced baselines and achieves positive average gain under aligned splits. Its atlas decomposes behavior into raw-feature, low-pass, high-pass, class-geometric, and ridge-boundary components. The resulting fingerprints distinguish low-pass-dominated Amazon graphs, mixed high-pass and class-geometrically complex Chameleon behavior, and raw- or boundary-sensitive WebKB graphs. Aligned interventions show when high-pass blocks act as removable noise, when raw or graph-derived signals should be preserved, and when ridge correction matters. WG-SRC therefore serves both as a functioning white-box classifier and as a dataset-fingerprinting probe, enabling fingerprint-conditioned analysis of how black-box model components behave under different graph-signal conditions.

Yuchen Xiong, Swee Keong Yeap, Zhen Hong Ban• 2026

Related benchmarks

TaskDatasetResultRank
Node ClassificationChameleon (test)
Mean Accuracy72.48
335
Node ClassificationCornell (test)
Mean Accuracy75.41
313
Node ClassificationTexas (test)
Mean Accuracy86.32
312
Node ClassificationWisconsin (test)
Mean Accuracy84.31
279
Node ClassificationAmazon Photo (test)
Accuracy88.76
112
Node ClassificationAmazon Computer (test)
Accuracy78.71
104
Showing 6 of 6 rows

Other info

Follow for update