Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Inference on Optimal Policy Values and Other Irregular Functionals via Softmax Smoothing

About

Constructing confidence intervals for the value of an (unknown) optimal treatment policy is a fundamental problem in causal inference. Insight into the optimal policy value can guide the development of reward-maximizing, individualized treatment regimes. However, because the functional that defines the optimal value is non-differentiable, standard semi-parametric approaches for performing inference fail to be directly applicable. Many existing works circumvent non-differentiability by making the unrealistic assumption of zero probability of treatment non-response, i.e. that every unit responds (either positively or negatively) to an assigned treatment. Further, works that don't circumvent this restriction rely on refitting nuisance models a number of times proportional to the sample size. In this paper, we construct and analyze a simple, softmax smoothing-based estimator for the value of an optimal treatment policy. Our estimator applies in both static and dynamic treatment regimes, only requires fitting a constant number of nuisance models, and is statistically efficient when there is zero probability of non-response to treatment. Also, while our estimator does not require making semi-parametric restrictions, it can exploit them when they exist. We further show how our softmax smoothing approach can be used to estimate general parameters that are specified as a maximum of scores involving nuisance components, and look at conditional Balke and Pearl bounds and $L^1$ calibration error as salient examples.

Justin Whitehouse, Qizhao Chen, Morgane Austern, Vasilis Syrgkanis• 2025

Related benchmarks

TaskDatasetResultRank
Instrumental Variable EstimationSTAR Strong instrument math scores Small vs. Regular class sizes
Validity Score1
6
Partial identification of causal effectsSynthetic Binary-outcome ground-truth bounds known
Validity100
6
Partial identification of causal effectsJobs semi-synthetic RCT-derived labels
Validity90
6
Causal effect estimationSTAR math scores Regular+Aide vs. Regular class sizes (Weak instrument ρ ≈ 0.28)
Validity1
6
Causal effect estimationSTAR math scores Regular+Aide vs. Regular class sizes (Strong instrument ρ ≈ 0.89)
Validity1
6
Causal effect estimationProject STAR Reading scores Weak instrument
Validity1
6
Causal effect estimationProject STAR Reading scores, Strong instrument
Validity1
6
Instrumental Variable EstimationAirplane demand modified binary (n=2048 samples)
Validity1
6
Instrumental Variable EstimationSTAR math scores Small vs. Regular class sizes Weak instrument, ρ(Z, T) ≈ 0.29
Validity100
6
Partial identification under instrumental variablesSTAR small vs. regular class size reading scores Weak instrument ρ(Z, T) ≈ 0.29
Validity1
6
Showing 10 of 11 rows

Other info

Follow for update