Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Distribution-Free Predictive Inference For Regression

About

We develop a general framework for distribution-free predictive inference in regression, using conformal inference. The proposed methodology allows for the construction of a prediction band for the response variable using any estimator of the regression function. The resulting prediction band preserves the consistency properties of the original estimator under standard assumptions, while guaranteeing finite-sample marginal coverage even when these assumptions do not hold. We analyze and compare, both empirically and theoretically, the two major variants of our conformal framework: full conformal inference and split conformal inference, along with a related jackknife method. These methods offer different tradeoffs between statistical accuracy (length of resulting prediction intervals) and computational efficiency. As extensions, we develop a method for constructing valid in-sample prediction intervals called {\it rank-one-out} conformal inference, which has essentially the same computational efficiency as split conformal inference. We also describe an extension of our procedures for producing prediction bands with locally varying length, in order to adapt to heteroskedascity in the data. Finally, we propose a model-free notion of variable importance, called {\it leave-one-covariate-out} or LOCO inference. Accompanying this paper is an R package {\tt conformalInference} that implements all of the proposals we have introduced. In the spirit of reproducibility, all of our empirical results can also be easily (re)generated using this package.

Jing Lei, Max G'Sell, Alessandro Rinaldo, Ryan J. Tibshirani, Larry Wasserman• 2016

Related benchmarks

TaskDatasetResultRank
Feature Significance TestingSynthetic Regression N=5e5 (test)
Total Rejections1
92
Feature Significance TestingSynthetic classification dataset
Total Rejections10
26
Object size area estimationH&E 100 random splits
Interval Size6.30e+3
18
Object size area estimationPolyP (100 random splits)
Interval Size1.24e+4
18
Conformal Predictionmeps 21 (test)
Average Length2.063
18
Object size area estimationNodule TN3K (100 random splits)
Interval Size4.59e+3
18
Object size area estimationSkin Lesion (100 random splits)
Interval Size1.12e+3
18
Conformal PredictionH&E
Interval Size2.00e+3
16
Conformal PredictionSkin Lesion
Interval Size1.91e+3
16
Conformal Predictionblog (test)--
14
Showing 10 of 37 rows

Other info

Follow for update