Batch Bayesian Active Learning with Partial Batch Label Sampling
About
Over the past couple of decades, many active learning acquisition functions have been proposed, leaving practitioners with an unclear choice of which to use. Bayesian-based active learning offers principled objectives with explainable intuition, including Expected Error Reduction (EER), Expected Predictive Information Gain (EPIG), and Bayesian Active Learning by Disagreements (BALD). A key challenge of such methods is the difficult scaling to large batch sizes, leading to either computational challenges (BatchBALD) or dramatic performance drops (top-$B$ selection). Here, using a particular formulation of Bayesian Decision Theory, we derive Partial Batch Label Sampling (ParBaLS) for the EPIG algorithm. We show experimentally for several datasets that ParBaLS EPIG gives superior performance for a fixed budget and Bayesian Logistic Regression on embeddings from large pre-trained models. Our code is available at https://github.com/ADDAPT-ML/ParBaLS.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-10 (test) | Accuracy95.4 | 882 | |
| Classification | CIFAR10 (test) | Accuracy85.52 | 331 | |
| Text Classification | AG News (test) | Accuracy88 | 293 | |
| Text Classification | Yelp (test) | Accuracy80.89 | 100 | |
| Image Classification | fMoW (test) | Top-1 Accuracy98.41 | 60 | |
| Classification | CivilComments (test) | Average Accuracy85.67 | 51 | |
| News Classification | AG News (test) | Accuracy84.11 | 48 | |
| Classification | Airline Passenger Satisfaction (test) | Accuracy89.38 | 45 | |
| Image Classification | iWildCam (test) | Accuracy89.77 | 45 | |
| Classification | Credit Card Fraud (test) | Accuracy93.46 | 45 |