Batch Bayesian Active Learning with Partial Batch Label Sampling

About

Over the past couple of decades, many active learning acquisition functions have been proposed, leaving practitioners with an unclear choice of which to use. Bayesian-based active learning offers principled objectives with explainable intuition, including Expected Error Reduction (EER), Expected Predictive Information Gain (EPIG), and Bayesian Active Learning by Disagreements (BALD). A key challenge of such methods is the difficult scaling to large batch sizes, leading to either computational challenges (BatchBALD) or dramatic performance drops (top-$B$ selection). Here, using a particular formulation of Bayesian Decision Theory, we derive Partial Batch Label Sampling (ParBaLS) for the EPIG algorithm. We show experimentally for several datasets that ParBaLS EPIG gives superior performance for a fixed budget and Bayesian Logistic Regression on embeddings from large pre-trained models. Our code is available at https://github.com/ADDAPT-ML/ParBaLS.

Kangping Hu, Stephen Mussmann• 2025

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-10 (test)	Accuracy95.4	1063
Classification	CIFAR10 (test)	Accuracy85.52	331
Text Classification	AG News (test)	Accuracy88	326
Text Classification	Yelp (test)	Accuracy80.89	100
Image Classification	fMoW (test)	Top-1 Accuracy98.41	60
Classification	CivilComments (test)	Average Accuracy85.67	51
News Classification	AG News (test)	Accuracy84.11	48
Classification	Airline Passenger Satisfaction (test)	Accuracy89.38	45
Image Classification	iWildCam (test)	Accuracy89.77	45
Classification	Credit Card Fraud (test)	Accuracy93.46	45

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord