Exact Stiefel Optimization for Probabilistic PLS: Closed-Form Updates, Error Bounds, and Calibrated Uncertainty
About
Probabilistic partial least squares (PPLS) is a central likelihood-based model for two-view learning when one needs both interpretable latent factors and calibrated uncertainty. Building on the identifiable parameterization of Bouhaddani et al.\ (2018), existing fitting pipelines still face two practical bottlenecks: noise--signal coupling under joint EM/ECM updates and nontrivial handling of orthogonality constraints. Following the fixed-noise scalar-likelihood line of Hu et al.\ (2025), we develop an end-to-end framework that combines noise pre-estimation, constrained likelihood optimization, and prediction calibration in one pipeline. Relative to Hu et al.\ (2025), we replace full-spectrum noise averaging with noise-subspace estimation and replace interior-point penalty handling with exact Stiefel-manifold optimization. The noise-subspace estimator attains a signal-strength-independent leading finite-sample rate and matches a minimax lower bound, while the full-spectrum estimator is shown to be inconsistent under the same model. We further extend the framework to sub-Gaussian settings via optional Gaussianization and provide closed-form standard errors through a block-structured Fisher analysis. Across synthetic high-noise settings and two multi-omics benchmarks (TCGA-BRCA and PBMC CITE-seq), the method achieves near-nominal coverage without post-hoc recalibration, reaches Ridge-level point accuracy on TCGA-BRCA at rank $r=3$, matches or exceeds PO2PLS on cross-view prediction while providing native calibrated uncertainty, and improves stability of parameter recovery.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Uncertainty Calibration | Gaussian synthetic benchmark (5-fold CV) | Empirical Coverage95.1 | 10 | |
| Uncertainty Estimation | TCGA-BRCA | MSE0.4498 | 9 | |
| Uncertainty Estimation | CITE-seq | MSE0.2586 | 9 | |
| Gene-protein pair detection | TCGA-BRCA | Total Detected Pairs1.27e+5 | 8 | |
| Protein imputation | PBMC CITE-seq (3-fold CV) | MSE0.2586 | 7 | |
| Parameter Estimation | Synthetic p=q=200, M=20, Low noise | MSE (W)0.01 | 6 | |
| Parameter Estimation | Synthetic p=q=200, M=20, High noise | MSE_W0.08 | 6 | |
| Parameter Estimation | Synthetic p=q=500, M=10, Low noise | MSE (W)0.01 | 5 | |
| Parameter Estimation | Synthetic p=q=500, M=10, High noise | MSE_W0.08 | 5 | |
| Point Prediction | TCGA-BRCA (5-fold CV) | MSE0.4498 | 5 |