Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity
About
Many bandit deployments (recommendation, clinical dosing, ad targeting) share two facts prior work handles only in isolation: rewards live on a low-dimensional latent subspace, and that subspace drifts. Stationary low-rank bandits exploit rank but break under subspace change; non-stationary linear bandits adapt to drift but pay ambient rate $\widetilde{O}(d\sqrt{T})$. We study piecewise-stationary low-rank linear contextual bandits with scalar feedback: $\theta_t = B_k^\star w_t$ with rank-$r$ factor $B_k^\star\in\mathbb{R}^{d\times r}$ constant within each of $K$ unknown segments and able to shift at boundaries. Our results are tight along three axes. (i) Identification boundary. With single-play scalar rewards, the moving subspace is recoverable through quadratic functionals of rewards iff three probe-side conditions hold: known noise variance, bounded state-noise coupling, and full-dimensional probe support. Each is necessary in the unrestricted-second-moment problem, and jointly they are sufficient, characterizing the boundary of the solvable region. (ii) Algorithm and dynamic regret. SPSC interleaves isotropic probes with windowed projected ridge-UCB exploitation inside the learned $r$-dimensional subspace; a CUSUM-style variant discovers segment boundaries online. The costed dynamic regret is $\widetilde{O}(r\sqrt{T})+\widetilde{O}(T^{2/3})+O(W\,V_{\mathrm{in}})$, replacing the ambient $d\sqrt{T}$ rate with the intrinsic rank. (iii) Empirics. On eleven benchmarks spanning synthetic, UCI/MovieLens, semi-synthetic clinical, and ZOZOTOWN production-log data, SPSC outperforms non-stationary and low-rank baselines whenever $d-r\gtrsim T^{1/6}$, matching the analytical crossover. To our knowledge, this is the first work to characterize the identification boundary and attain the intrinsic-rank dynamic-regret rate in this setting.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Piecewise-stationary low-rank bandit | Synthetic (d, r)-grid | Costed Regret208 | 32 | |
| Clinical Dosing | Warfarin (IWPC cohort) | Costed Regret657 | 8 | |
| Linear Bandit Control | small-d piecewise-stationary stress test | Control Regret2.23e+3 | 6 | |
| Clinical Dosing | Vancomycin | Control Regret525.3 | 5 | |
| Piecewise low-rank bandit | MNIST | SPSC vs LinUCB Ratio0.84 | 4 | |
| Piecewise low-rank bandit | Fashion MNIST | SPSC/LinUCB Ratio88 | 4 | |
| Piecewise low-rank bandit | MovieLens 100k | SPSC/LinUCB Ratio1 | 4 | |
| Piecewise low-rank bandit | UCI Covertype | Performance Ratio (SPSC vs LinUCB)96 | 3 | |
| Piecewise low-rank bandit | pendigits | SPSC/LinUCB Ratio76 | 2 | |
| Piecewise low-rank bandit | SATIMAGE | Performance Ratio (SPSC vs LinUCB)47 | 2 |