Inference on Optimal Policy Values and Other Irregular Functionals via Softmax Smoothing

About

Constructing confidence intervals for the value of an (unknown) optimal treatment policy is a fundamental problem in causal inference. Insight into the optimal policy value can guide the development of reward-maximizing, individualized treatment regimes. However, because the functional that defines the optimal value is non-differentiable, standard semi-parametric approaches for performing inference fail to be directly applicable. Many existing works circumvent non-differentiability by making the unrealistic assumption of zero probability of treatment non-response, i.e. that every unit responds (either positively or negatively) to an assigned treatment. Further, works that don't circumvent this restriction rely on refitting nuisance models a number of times proportional to the sample size. In this paper, we construct and analyze a simple, softmax smoothing-based estimator for the value of an optimal treatment policy. Our estimator applies in both static and dynamic treatment regimes, only requires fitting a constant number of nuisance models, and is statistically efficient when there is zero probability of non-response to treatment. Also, while our estimator does not require making semi-parametric restrictions, it can exploit them when they exist. We further show how our softmax smoothing approach can be used to estimate general parameters that are specified as a maximum of scores involving nuisance components, and look at conditional Balke and Pearl bounds and $L^1$ calibration error as salient examples.

Justin Whitehouse, Qizhao Chen, Morgane Austern, Vasilis Syrgkanis• 2025

Related benchmarks

Task	Dataset	Result
Instrumental Variable Estimation	STAR Strong instrument math scores Small vs. Regular class sizes	Validity Score1	6
Partial identification of causal effects	Synthetic Binary-outcome ground-truth bounds known	Validity100	6
Partial identification of causal effects	Jobs semi-synthetic RCT-derived labels	Validity90	6
Causal effect estimation	STAR math scores Regular+Aide vs. Regular class sizes (Weak instrument ρ ≈ 0.28)	Validity1	6
Causal effect estimation	STAR math scores Regular+Aide vs. Regular class sizes (Strong instrument ρ ≈ 0.89)	Validity1	6
Causal effect estimation	Project STAR Reading scores Weak instrument	Validity1	6
Causal effect estimation	Project STAR Reading scores, Strong instrument	Validity1	6
Instrumental Variable Estimation	Airplane demand modified binary (n=2048 samples)	Validity1	6
Instrumental Variable Estimation	STAR math scores Small vs. Regular class sizes Weak instrument, ρ(Z, T) ≈ 0.29	Validity100	6
Partial identification under instrumental variables	STAR small vs. regular class size reading scores Weak instrument ρ(Z, T) ≈ 0.29	Validity1	6

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord