Active Learning for Regression Using Greedy Sampling

About

Regression problems are pervasive in real-world applications. Generally a substantial amount of labeled samples are needed to build a regression model with good generalization ability. However, many times it is relatively easy to collect a large number of unlabeled samples, but time-consuming or expensive to label them. Active learning for regression (ALR) is a methodology to reduce the number of labeled samples, by selecting the most beneficial ones to label, instead of random selection. This paper proposes two new ALR approaches based on greedy sampling (GS). The first approach (GSy) selects new samples to increase the diversity in the output space, and the second (iGS) selects new samples to increase the diversity in both input and output spaces. Extensive experiments on 12 UCI and CMU StatLib datasets from various domains, and on 15 subjects on EEG-based driver drowsiness estimation, verified their effectiveness and robustness.

Dongrui Wu, Chin-Teng Lin, Jian Huang• 2018

Related benchmarks

Task	Dataset	Result
Average Treatment Effect Estimation	Synthetic Data	Averaged MSE6.13	54
Regression	Yacht	Normalized AUC (RMSE)1.19	9
Regression	NO2	Normalized AUC of RMSE0.91	9
Regression	Housing	Normalized AUC (RMSE)0.81	9
Regression	pm10	Normalized AUC of RMSE0.95	9
Regression	Airfoil	Normalized AUC of RMSE0.98	9
Regression	Wine white	Normalized AUC (RMSE)0.94	9
Regression	CPS	Normalized AUC (RMSE)0.73	9
Regression	EE-Cooling	Normalized AUC of RMSE0.95	9
Regression	Wine Red	Normalized AUC (RMSE)0.88	9

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord