Mixture Proportion Estimation and PU Learning: A Modern Approach
About
Given only positive examples and unlabeled examples (from both positive and negative classes), we might hope nevertheless to estimate an accurate positive-versus-negative classifier. Formally, this task is broken down into two subtasks: (i) Mixture Proportion Estimation (MPE) -- determining the fraction of positive examples in the unlabeled data; and (ii) PU-learning -- given such an estimate, learning the desired positive-versus-negative classifier. Unfortunately, classical methods for both problems break down in high-dimensional settings. Meanwhile, recently proposed heuristics lack theoretical coherence and depend precariously on hyperparameter tuning. In this paper, we propose two simple techniques: Best Bin Estimation (BBE) (for MPE); and Conditional Value Ignoring Risk (CVIR), a simple objective for PU-learning. Both methods dominate previous approaches empirically, and for BBE, we establish formal guarantees that hold whenever we can train a model to cleanly separate out a small subset of positive examples. Our final algorithm (TED)$^n$, alternates between the two procedures, significantly improving both our mixture proportion estimator and classifier
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| PvN classification | Binarized CIFAR | Accuracy82.7 | 18 | |
| Mixture Proportion Estimation | Binarized CIFAR | Absolute Estimation Error0.026 | 17 | |
| Mixture Proportion Estimation | CIFAR Dog vs Cat | Abs. Estimation Error0.066 | 12 | |
| PvN classification | CIFAR Dog vs Cat | Accuracy75.2 | 12 | |
| Scribble-supervised cardiac segmentation | MyoPS | Dice (Scar)28.8 | 8 | |
| Mixture Proportion Estimation | Binarized MNIST | Absolute Estimation Error (%)2.4 | 7 | |
| Mixture Proportion Estimation | MNIST 17 | Abs Estimation Error0.3 | 7 | |
| Mixture Proportion Estimation | UCI CONCRETE (test) | Absolute Estimation Error0.071 | 6 | |
| Mixture Proportion Estimation | UCI mushroom (test) | Absolute Estimation Error0.001 | 6 | |
| Mixture Proportion Estimation | UCI pageblock (test) | Absolute Estimation Error0.007 | 6 |