Policy-Controlled Generalized Share: A General Framework with a Transformer Instantiation for Strictly Online Switching-Oracle Tracking
About
Static regret to a single expert is often the wrong target for strictly online prediction under non-stationarity, where the best expert may switch repeatedly over time. We study Policy-Controlled Generalized Share (PCGS), a general strictly online framework in which the generalized-share recursion is fixed while the post-loss update controls are allowed to vary adaptively. Its principal instantiation in this paper is PCGS-TF, which uses a causal Transformer as an update controller: after round t finishes and the loss vector is observed, the Transformer outputs the controls that map w_t to w_{t+1} without altering the already committed decision w_t. Under admissible post-loss update controls, we obtain a pathwise weighted regret guarantee for general time-varying learning rates, and a standard dynamic-regret guarantee against any expert path with at most S switches under the constant-learning-rate specialization. Empirically, on a controlled synthetic suite with exact dynamic-programming switching-oracle evaluation, PCGS-TF attains the lowest mean dynamic regret in all seven non-stationary families, with its advantage increasing for larger expert pools. On a reproduced household-electricity benchmark, PCGS-TF also achieves the lowest normalized dynamic regret for S = 5, 10, and 20.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Dynamic Regret Minimization | household electricity consumption real-data | Normalized Dynamic Regret0.0063 | 16 | |
| Online Learning (Dynamic Regret Minimization) | Switch Family Synthetic | Mean Dynamic Regret18.93 | 4 | |
| Online Learning (Dynamic Regret Minimization) | Drift Family Synthetic | Dynamic Regret (mean)18.64 | 4 | |
| Online Learning (Dynamic Regret Minimization) | Hetero Family Synthetic | Mean Dynamic Regret20.73 | 4 | |
| Online Learning (Dynamic Regret Minimization) | HeavyTail Family Synthetic | Mean Dynamic Regret21.17 | 4 | |
| Online Learning (Dynamic Regret Minimization) | Mix Family Synthetic | Mean Dynamic Regret19.39 | 4 | |
| Online Learning (Dynamic Regret Minimization) | Predictive Family Synthetic | Mean Dynamic Regret16.14 | 4 | |
| Online Learning (Dynamic Regret Minimization) | Adversarial Family Synthetic | Mean Dynamic Regret20.64 | 4 | |
| Online sequence prediction | Synthetic non-stationary sequences Adversarial family | Win Rate (vs GenShare)100 | 1 | |
| Online sequence prediction | Synthetic non-stationary sequences Drift family | Win Rate vs GenShare100 | 1 |