SPPCSO: Adaptive Penalized Estimation Method for High-Dimensional Correlated Data

About

With the rise of high-dimensional correlated data, multicollinearity poses a significant challenge to model stability, often leading to unstable estimation and reduced predictive accuracy. This work proposes the Single-Parametric Principal Component Selection Operator (SPPCSO), an innovative penalized estimation method that integrates single-parametric principal component regression and $L_{1}$ regularization to adaptively adjust the shrinkage factor by incorporating principal component information. This approach achieves a balance between variable selection and coefficient estimation, ensuring model stability and robust estimation even in high-dimensional, high-noise environments. The primary contribution lies in addressing the instability of traditional variable selection methods when applied to high-noise, high-dimensional correlated data. Theoretically, our method exhibits selection consistency and achieves a smaller estimation error bound compared to traditional penalized estimation approaches. Extensive numerical experiments demonstrate that SPPCSO not only delivers stable and reliable estimation in high-noise settings but also accurately distinguishes signal variables from noise variables in group-effect structured data with highly correlated noise variables, effectively eliminating redundant variables and achieving more stable variable selection. Furthermore, SPPCSO successfully identifies disease-associated genes in gene expression data analysis, showcasing strong practical value. The results indicate that SPPCSO serves as an ideal tool for high-dimensional variable selection, offering an efficient and interpretable solution for modeling correlated data.

Ying Hu, Hu Yang• 2026

Related benchmarks

Task	Dataset	Result
Variable Selection	Example 1	TPR100	24
Estimation	Example 2 rho=0.5	Estimation Error1.2182	16
Estimation	Example 2 rho=0.75	Estimation Error1.1597	16
Estimation	Example 2 (rho=0.95)	Estimation Error1.1147	16
Estimation Error	Example 1 sigma=2 (N=100)	Estimation Error1.1677	8
Penalized estimation	rat genetic data (test)	MAPE8.03	8
Sparse Modeling	Example 1 sigma=2	Pre Error4.6958	8
Variable Selection	Example 2 ρ=0.75	TPR100	8
Variable Selection	Example 2 ρ=0.95	TPR100	8
Estimation Error	Example sigma=1 N=100 1	Estimation Error1.0472	8

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord