Doubly robust off-policy evaluation with shrinkage
About
We propose a new framework for designing estimators for off-policy evaluation in contextual bandits. Our approach is based on the asymptotically optimal doubly robust estimator, but we shrink the importance weights to minimize a bound on the mean squared error, which results in a better bias-variance tradeoff in finite samples. We use this optimization-based framework to obtain three estimators: (a) a weight-clipping estimator, (b) a new weight-shrinkage estimator, and (c) the first shrinkage-based estimator for combinatorial action sets. Extensive experiments in both standard and combinatorial bandit benchmark problems show that our estimators are highly adaptive and typically outperform state-of-the-art methods.
Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, Miroslav Dud\'ik• 2019
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Off-Policy Learning | Wiki10-31K Synthetic tau=0.5 (test) | P@50.5526 | 14 | |
| Off-Policy Learning | Wiki10-31K Synthetic tau=1 (test) | P@554.99 | 14 | |
| Off-Policy Learning | Wiki10-31K Synthetic tau=2 (test) | P@50.5347 | 14 | |
| Recommendation | Yahoo! R3 (test) | P@528.43 | 13 | |
| Recommendation | Coat (test) | Precision@50.279 | 13 | |
| Recommendation | KuaiRec (test) | Precision@5087.44 | 13 | |
| Off-policy Evaluation | Digits (UCI) | MSE0.0384 | 12 | |
| Off-policy Evaluation | PenDigits (UCI) | MSE0.0138 | 6 | |
| Off-policy Evaluation | SatImage (UCI) | MSE0.0078 | 6 | |
| Off-policy Evaluation | Letter (UCI) | MSE0.2363 | 6 |
Showing 10 of 16 rows