SPPCSO: Adaptive Penalized Estimation Method for High-Dimensional Correlated Data
About
With the rise of high-dimensional correlated data, multicollinearity poses a significant challenge to model stability, often leading to unstable estimation and reduced predictive accuracy. This work proposes the Single-Parametric Principal Component Selection Operator (SPPCSO), an innovative penalized estimation method that integrates single-parametric principal component regression and $L_{1}$ regularization to adaptively adjust the shrinkage factor by incorporating principal component information. This approach achieves a balance between variable selection and coefficient estimation, ensuring model stability and robust estimation even in high-dimensional, high-noise environments. The primary contribution lies in addressing the instability of traditional variable selection methods when applied to high-noise, high-dimensional correlated data. Theoretically, our method exhibits selection consistency and achieves a smaller estimation error bound compared to traditional penalized estimation approaches. Extensive numerical experiments demonstrate that SPPCSO not only delivers stable and reliable estimation in high-noise settings but also accurately distinguishes signal variables from noise variables in group-effect structured data with highly correlated noise variables, effectively eliminating redundant variables and achieving more stable variable selection. Furthermore, SPPCSO successfully identifies disease-associated genes in gene expression data analysis, showcasing strong practical value. The results indicate that SPPCSO serves as an ideal tool for high-dimensional variable selection, offering an efficient and interpretable solution for modeling correlated data.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Variable Selection | Example 1 | TPR100 | 24 | |
| Estimation | Example 2 rho=0.5 | Estimation Error1.2182 | 16 | |
| Estimation | Example 2 rho=0.75 | Estimation Error1.1597 | 16 | |
| Estimation | Example 2 (rho=0.95) | Estimation Error1.1147 | 16 | |
| Estimation Error | Example 1 sigma=2 (N=100) | Estimation Error1.1677 | 8 | |
| Penalized estimation | rat genetic data (test) | MAPE8.03 | 8 | |
| Sparse Modeling | Example 1 sigma=2 | Pre Error4.6958 | 8 | |
| Variable Selection | Example 2 ρ=0.75 | TPR100 | 8 | |
| Variable Selection | Example 2 ρ=0.95 | TPR100 | 8 | |
| Estimation Error | Example sigma=1 N=100 1 | Estimation Error1.0472 | 8 |