Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Post-ADC Inference: Valid Inference After Active Data Collection

About

The validity of statistical inference depends critically on how data are collected. When data gathered through active data collection (ADC) are reused for a post-hoc inferential task, conventional inference can fail because the sampling is adaptively biased toward regions favored by the collection strategy. This issue is especially pronounced in black-box optimization, where sequential model-based optimization (SMBO) methods such as the tree-structured Parzen estimator (TPE) and Gaussian process upper confidence bound (GP-UCB) preferentially concentrate evaluations in promising regions. We study statistical inference on actively collected data when the inferential target is constructed in a data-dependent manner after data collection. To enable valid inference in this setting, we propose post-ADC inference, a framework that accounts for the biases arising from both the active data collection process and the subsequent data-driven target construction. Our method builds on selective inference and provides valid $p$-values and confidence intervals that correct for both sources of bias. The framework applies to a broad class of ADC processes by imposing only assumptions on the observation noise, without requiring any assumptions on the underlying black-box function or the surrogate model used by the SMBO algorithm. Empirical results also show that post-ADC inference provides valid inference for data collected by GP-UCB and TPE.

Shuichi Nishino, Tomohiro Shiraishi, Teruyuki Katsuoka, Ichiro Takeuchi• 2026

Related benchmarks

TaskDatasetResultRank
Signal DetectionGas Turbine CO
Empirical Rejection Probability99.4
4
Signal DetectionGas Turbine NOx
Empirical Rejection Probability83.6
4
Signal DetectionPower Plant
Empirical Rejection Probability100
4
Signal DetectionGas Turbine CO d=3
Empirical Rejection Probability99.9
4
Signal DetectionPower Plant real datasets d=3
Empirical Rejection Probability100
4
Statistical Power AnalysisConcrete
Empirical Rejection Probability43.2
4
Statistical Power AnalysisGas Turbine CO
Empirical Rejection Probability99.5
4
Statistical Power AnalysisGas Turbine NOx
Empirical Rejection Probability91.8
4
Statistical Power AnalysisPower Plant
Empirical Rejection Probability99.9
4
Signal DetectionConcrete
Empirical Rejection Probability14.5
4
Showing 10 of 12 rows

Other info

Follow for update