Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning

About

Data Shapley has recently been proposed as a principled framework to quantify the contribution of individual datum in machine learning. It can effectively identify helpful or harmful data points for a learning algorithm. In this paper, we propose Beta Shapley, which is a substantial generalization of Data Shapley. Beta Shapley arises naturally by relaxing the efficiency axiom of the Shapley value, which is not critical for machine learning settings. Beta Shapley unifies several popular data valuation methods and includes data Shapley as a special case. Moreover, we prove that Beta Shapley has several desirable statistical properties and propose efficient algorithms to estimate it. We demonstrate that Beta Shapley outperforms state-of-the-art data valuation methods on several downstream ML tasks such as: 1) detecting mislabeled training data; 2) learning with subsamples; and 3) identifying points whose addition or removal have the largest positive or negative impact on the model.

Yongchan Kwon, James Zou• 2021

Related benchmarks

TaskDatasetResultRank
Label Noise IdentificationMNIST (train)
AUC0.845
15
High-value data removalCIFAR10 binarized (test)--
11
High-value data removalCovertype (test)
Weighted Accuracy Drop11.2
8
High-value data removalMNIST-2 binarized (test)
Weighted Accuracy Drop0.006
8
High-value data removalCPU (test)
Weighted Accuracy Drop0.021
8
High-value data removalDiabetes (test)
Weighted Accuracy Drop2.2
8
High-value data removalFMNIST binarized (test)
Weighted Accuracy Drop3.2
8
High-value data removalMNIST-10 multi-class (test)
Weighted Accuracy Drop6.4
8
Noisy label detectionDiabetes
AUC0.435
8
High-value data removalClick (test)
Weighted Accuracy Drop0.4
8
Showing 10 of 18 rows

Other info

Follow for update