When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency

About

Sudden concept drift makes previously trained predictors unreliable, yet deciding when to retrain and what post-drift data size is sufficient is rarely addressed. We propose CALIPER - a detector- and model-agnostic, data-only test that estimates the post-drift data size required for stable retraining. CALIPER exploits state dependence in streams generated by dynamical systems: we run a single-pass weighted local regression over the post-drift window and track a one-step proxy error as a function of a locality parameter $\theta$. When an effective sample size gate is satisfied, a monotonically non-increasing trend in this error with increasing a locality parameter indicates that the data size is sufficiently informative for retraining. We also provide a theoretical analysis of our method, and we show that the algorithm has a low per-update time and memory. Across datasets from four heterogeneous domains, three learner families, and two detectors, CALIPER consistently matches or exceeds the best fixed data size for retraining while incurring negligible overhead and often outperforming incremental updates. CALIPER closes the gap between drift detection and data-sufficient adaptation in streaming learning.

Ren Fujiwara, Yasuko Matsubara, Yasushi Sakurai• 2026

Related benchmarks

Task	Dataset	Result
Time Series Regression	MoCap	MSE7.106	6
Time Series Regression	TEP	MSE1.76	6
Time Series Regression	Dysts	MSE0.432	6
Time Series Regression	automobile	MSE1.948	6

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord