Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Spatially Robust Inference with Predicted and Missing at Random Labels

About

When outcome data are expensive or onerous to collect, scientists increasingly substitute predictions from machine learning and AI models for unlabeled cases, a process which has consequences for downstream statistical inference. While recent methods provide valid uncertainty quantification under independent sampling, real-world applications involve missing at random (MAR) labeling and spatial dependence. For inference in this setting, we propose a doubly robust estimator with cross-fit nuisances. We show that cross-fitting induces fold-level correlation that distorts spatial variance estimators, producing unstable or overly conservative confidence intervals. To address this, we propose a jackknife spatial heteroscedasticity and autocorrelation consistent (HAC) variance correction that separates spatial dependence from fold-induced noise. Under standard identification and dependence conditions, the resulting intervals are asymptotically valid. Simulations and benchmark datasets show substantial improvement in finite-sample calibration, particularly under MAR labeling and clustered sampling.

Stephen Salerno, Zhenke Wu, Tyler McCormick• 2026

Related benchmarks

TaskDatasetResultRank
Prediction Interval CoverageMalaria (MAR)
Coverage94.4
12
Prediction Interval CoverageForest MAR
Coverage92.5
6
Prediction Interval CoverageForest MCAR
Coverage96.9
6
Prediction Interval CoverageGalaxies MCAR
Coverage94.4
6
Prediction Interval CoverageCensus income MCAR
Coverage90.6
6
Prediction Interval CoverageGalaxies MAR
Coverage87.5
6
Prediction Interval CoverageCensus income MAR
Coverage80.6
6
Prediction Interval CoverageHealth+ MAR
Coverage91.2
6
Prediction Interval CoverageHealth+ MCAR
Coverage90
6
Showing 9 of 9 rows

Other info

Follow for update