Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Practical and Optimal Algorithm for Linear Contextual Bandits with Rare Parameter Updates

About

We study linear contextual bandits under rare parameter updates: the learner may incorporate reward feedback into its parameter estimate only at a small number of update times, while still observing contexts online and selecting actions sequentially. This viewpoint clarifies a practical distinction that is often blurred in the literature: many "strictly batched" methods additionally restrict within-interval context adaptivity, meaning that the action rule inside an interval cannot depend on the sequence of realized contexts/actions in that interval (beyond the current round's context). For linear contextual bandits, we propose two practical algorithms with only $O(\log\log T)$ parameter updates. Our first algorithm BLCE-G attains minimax-optimal regret (up to polylogarithmic factors in $T$) simultaneously in both the small-$K$ and large-$K$ regimes under a static schedule. Our second algorithm BLCE removes the near G-optimal design step -- a dominant computational bottleneck in prior strictly batched static-grid methods -- yet preserves minimax-optimal regret and achieves the lowest known runtime complexity among optimal algorithms. We further extend these rare-update and computational principles to generalized linear contextual bandits. Overall, our results yield statistically optimal algorithms under $O(\log\log T)$ parameter updates that are also computationally efficient in practice.

Sanghoon Yu, Min-hwan Oh• 2026

Related benchmarks

TaskDatasetResultRank
Linear Contextual BanditsSynthetic Linear Bandit K=5000, d=10
Average Runtime (s)6.19
12
Linear Contextual BanditsSynthetic Linear Bandit K=50, d=20
Average Runtime (s)1.06
6
Linear Contextual BanditsSynthetic Linear Bandit (K=100, d=30)
Average Runtime (s)1.62
6
Linear Contextual BanditsNormal Contexts K=50, d=20 synthetic (test)
Average Runtime (s)1.05
6
Linear Contextual BanditsNormal Contexts K=100, d=30 synthetic (test)
Average Runtime (s)1.41
6
Linear Contextual BanditsSynthetic Linear Bandit K=1000, d=5
Average Runtime (s)5.91
6
Linear Contextual BanditsNormal Contexts K=1000, d=5 synthetic (test)
Average Runtime (s)2.17
6
Linear Contextual BanditUniform distribution
Average Runtime (s)3.62
4
Linear Contextual BanditNormal distribution
Average Runtime (seconds)3.88
4
Regret MinimizationGeneralized Linear Contextual Bandits
Worst-Case Regret2
2
Showing 10 of 10 rows

Other info

Follow for update