Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity

About

Many bandit deployments (recommendation, clinical dosing, ad targeting) share two facts prior work handles only in isolation: rewards live on a low-dimensional latent subspace, and that subspace drifts. Stationary low-rank bandits exploit rank but break under subspace change; non-stationary linear bandits adapt to drift but pay ambient rate $\widetilde{O}(d\sqrt{T})$. We study piecewise-stationary low-rank linear contextual bandits with scalar feedback: $\theta_t = B_k^\star w_t$ with rank-$r$ factor $B_k^\star\in\mathbb{R}^{d\times r}$ constant within each of $K$ unknown segments and able to shift at boundaries. Our results are tight along three axes. (i) Identification boundary. With single-play scalar rewards, the moving subspace is recoverable through quadratic functionals of rewards iff three probe-side conditions hold: known noise variance, bounded state-noise coupling, and full-dimensional probe support. Each is necessary in the unrestricted-second-moment problem, and jointly they are sufficient, characterizing the boundary of the solvable region. (ii) Algorithm and dynamic regret. SPSC interleaves isotropic probes with windowed projected ridge-UCB exploitation inside the learned $r$-dimensional subspace; a CUSUM-style variant discovers segment boundaries online. The costed dynamic regret is $\widetilde{O}(r\sqrt{T})+\widetilde{O}(T^{2/3})+O(W\,V_{\mathrm{in}})$, replacing the ambient $d\sqrt{T}$ rate with the intrinsic rank. (iii) Empirics. On eleven benchmarks spanning synthetic, UCI/MovieLens, semi-synthetic clinical, and ZOZOTOWN production-log data, SPSC outperforms non-stationary and low-rank baselines whenever $d-r\gtrsim T^{1/6}$, matching the analytical crossover. To our knowledge, this is the first work to characterize the identification boundary and attain the intrinsic-rank dynamic-regret rate in this setting.

Hamed Khosravi, Xiaoming Huo• 2026

Related benchmarks

TaskDatasetResultRank
Piecewise-stationary low-rank banditSynthetic (d, r)-grid
Costed Regret208
32
Clinical DosingWarfarin (IWPC cohort)
Costed Regret657
8
Linear Bandit Controlsmall-d piecewise-stationary stress test
Control Regret2.23e+3
6
Clinical DosingVancomycin
Control Regret525.3
5
Piecewise low-rank banditMNIST
SPSC vs LinUCB Ratio0.84
4
Piecewise low-rank banditFashion MNIST
SPSC/LinUCB Ratio88
4
Piecewise low-rank banditMovieLens 100k
SPSC/LinUCB Ratio1
4
Piecewise low-rank banditUCI Covertype
Performance Ratio (SPSC vs LinUCB)96
3
Piecewise low-rank banditpendigits
SPSC/LinUCB Ratio76
2
Piecewise low-rank banditSATIMAGE
Performance Ratio (SPSC vs LinUCB)47
2
Showing 10 of 10 rows

Other info

Follow for update