Practical and Optimal Algorithm for Linear Contextual Bandits with Rare Parameter Updates

About

We study linear contextual bandits under rare parameter updates: the learner may incorporate reward feedback into its parameter estimate only at a small number of update times, while still observing contexts online and selecting actions sequentially. This viewpoint clarifies a practical distinction that is often blurred in the literature: many "strictly batched" methods additionally restrict within-interval context adaptivity, meaning that the action rule inside an interval cannot depend on the sequence of realized contexts/actions in that interval (beyond the current round's context). For linear contextual bandits, we propose two practical algorithms with only $O(\log\log T)$ parameter updates. Our first algorithm BLCE-G attains minimax-optimal regret (up to polylogarithmic factors in $T$) simultaneously in both the small-$K$ and large-$K$ regimes under a static schedule. Our second algorithm BLCE removes the near G-optimal design step -- a dominant computational bottleneck in prior strictly batched static-grid methods -- yet preserves minimax-optimal regret and achieves the lowest known runtime complexity among optimal algorithms. We further extend these rare-update and computational principles to generalized linear contextual bandits. Overall, our results yield statistically optimal algorithms under $O(\log\log T)$ parameter updates that are also computationally efficient in practice.

Sanghoon Yu, Min-hwan Oh• 2026

Related benchmarks

Task	Dataset	Result
Linear Contextual Bandits	Synthetic Linear Bandit K=5000, d=10	Average Runtime (s)6.19	12
Linear Contextual Bandits	Synthetic Linear Bandit K=50, d=20	Average Runtime (s)1.06	6
Linear Contextual Bandits	Synthetic Linear Bandit (K=100, d=30)	Average Runtime (s)1.62	6
Linear Contextual Bandits	Normal Contexts K=50, d=20 synthetic (test)	Average Runtime (s)1.05	6
Linear Contextual Bandits	Normal Contexts K=100, d=30 synthetic (test)	Average Runtime (s)1.41	6
Linear Contextual Bandits	Synthetic Linear Bandit K=1000, d=5	Average Runtime (s)5.91	6
Linear Contextual Bandits	Normal Contexts K=1000, d=5 synthetic (test)	Average Runtime (s)2.17	6
Linear Contextual Bandit	Uniform distribution	Average Runtime (s)3.62	4
Linear Contextual Bandit	Normal distribution	Average Runtime (seconds)3.88	4
Regret Minimization	Generalized Linear Contextual Bandits	Worst-Case Regret2	2

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord