Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Splines-Based Feature Importance in Kolmogorov-Arnold Networks: A Framework for Supervised Tabular Data Dimensionality Reduction

About

Feature selection is a key step in many tabular prediction problems, where multiple candidate variables may be redundant, noisy, or weakly informative. We investigate feature selection based on Kolmogorov-Arnold Networks (KANs), which parameterize feature transformations with splines and expose per-feature importance scores in a natural way. From this idea we derive four KAN-based selection criteria (coefficient norms, gradient-based saliency, and knockout scores) and compare them with standard methods such as LASSO, Random Forest feature importance, Mutual Information, and SVM-RFE on a suite of real and synthetic classification and regression datasets. Using average F1 and $R^2$ scores across three feature-retention levels (20%, 40%, 60%), we find that KAN-based selectors are generally competitive with, and sometimes superior to, classical baselines. In classification, KAN criteria often match or exceed existing methods on multi-class tasks by removing redundant features and capturing nonlinear interactions. In regression, KAN-based scores provide robust performance on noisy and heterogeneous datasets, closely tracking strong ensemble predictors; we also observe characteristic failure modes, such as overly aggressive pruning with an $\ell_1$ criterion. Stability and redundancy analyses further show that KAN-based selectors yield reproducible feature subsets across folds while avoiding unnecessary correlation inflation, ensuring reliable and non-redundant variable selection. Overall, our findings demonstrate that KAN-based feature selection provides a powerful and interpretable alternative to traditional methods, capable of uncovering nonlinear and multivariate feature relevance beyond sparsity or impurity-based measures.

Ange-Cl\'ement Akazan, Verlon Roel Mbingui• 2025

Related benchmarks

TaskDatasetResultRank
ClassificationBreast cancer
Accuracy97.73
56
ClassificationWine
F1 Macro98.37
48
RegressionCalifornia
R2 Score84.31
40
Classificationmake_classification
Macro F1 Score91.39
36
Regressionmake_regression 60% retention level
R2 Score99.999
36
Regressiondiamonds (60% retention)
R2 Score0.9817
36
ClassificationDigits
Macro-F1100
36
Showing 7 of 7 rows

Other info

Follow for update