Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity

About

Many bandit deployments (recommendation, clinical dosing, ad targeting) share two facts prior work handles only in isolation: rewards live on a low-dimensional latent subspace, and that subspace drifts. Stationary low-rank bandits exploit rank but break under subspace change; non-stationary linear bandits adapt to drift but pay ambient rate $\widetilde{O}(d\sqrt{T})$. We study piecewise-stationary low-rank linear contextual bandits with scalar feedback: $\theta_t = B_k^\star w_t$ with rank-$r$ factor $B_k^\star\in\mathbb{R}^{d\times r}$ constant within each of $K$ unknown segments and able to shift at boundaries. Our results are tight along three axes. (i) Identification boundary. With single-play scalar rewards, the moving subspace is recoverable through quadratic functionals of rewards iff three probe-side conditions hold: known noise variance, bounded state-noise coupling, and full-dimensional probe support. Each is necessary in the unrestricted-second-moment problem, and jointly they are sufficient, characterizing the boundary of the solvable region. (ii) Algorithm and dynamic regret. SPSC interleaves isotropic probes with windowed projected ridge-UCB exploitation inside the learned $r$-dimensional subspace; a CUSUM-style variant discovers segment boundaries online. The costed dynamic regret is $\widetilde{O}(r\sqrt{T})+\widetilde{O}(T^{2/3})+O(W\,V_{\mathrm{in}})$, replacing the ambient $d\sqrt{T}$ rate with the intrinsic rank. (iii) Empirics. On eleven benchmarks spanning synthetic, UCI/MovieLens, semi-synthetic clinical, and ZOZOTOWN production-log data, SPSC outperforms non-stationary and low-rank baselines whenever $d-r\gtrsim T^{1/6}$, matching the analytical crossover. To our knowledge, this is the first work to characterize the identification boundary and attain the intrinsic-rank dynamic-regret rate in this setting.

Hamed Khosravi, Xiaoming Huo• 2026

Related benchmarks

Task	Dataset	Result
Piecewise-stationary low-rank bandit	Synthetic (d, r)-grid	Costed Regret208	32
Clinical Dosing	Warfarin (IWPC cohort)	Costed Regret657	8
Linear Bandit Control	small-d piecewise-stationary stress test	Control Regret2.23e+3	6
Clinical Dosing	Vancomycin	Control Regret525.3	5
Piecewise low-rank bandit	MNIST	SPSC vs LinUCB Ratio0.84	4
Piecewise low-rank bandit	Fashion MNIST	SPSC/LinUCB Ratio88	4
Piecewise low-rank bandit	MovieLens 100k	SPSC/LinUCB Ratio1	4
Piecewise low-rank bandit	UCI Covertype	Performance Ratio (SPSC vs LinUCB)96	3
Piecewise low-rank bandit	pendigits	SPSC/LinUCB Ratio76	2
Piecewise low-rank bandit	SATIMAGE	Performance Ratio (SPSC vs LinUCB)47	2

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord