Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency

About

Sudden concept drift makes previously trained predictors unreliable, yet deciding when to retrain and what post-drift data size is sufficient is rarely addressed. We propose CALIPER - a detector- and model-agnostic, data-only test that estimates the post-drift data size required for stable retraining. CALIPER exploits state dependence in streams generated by dynamical systems: we run a single-pass weighted local regression over the post-drift window and track a one-step proxy error as a function of a locality parameter $\theta$. When an effective sample size gate is satisfied, a monotonically non-increasing trend in this error with increasing a locality parameter indicates that the data size is sufficiently informative for retraining. We also provide a theoretical analysis of our method, and we show that the algorithm has a low per-update time and memory. Across datasets from four heterogeneous domains, three learner families, and two detectors, CALIPER consistently matches or exceeds the best fixed data size for retraining while incurring negligible overhead and often outperforming incremental updates. CALIPER closes the gap between drift detection and data-sufficient adaptation in streaming learning.

Ren Fujiwara, Yasuko Matsubara, Yasushi Sakurai• 2026

Related benchmarks

TaskDatasetResultRank
Time Series RegressionMoCap
MSE7.106
6
Time Series RegressionTEP
MSE1.76
6
Time Series RegressionDysts
MSE0.432
6
Time Series Regressionautomobile
MSE1.948
6
Showing 4 of 4 rows

Other info

Follow for update