Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Towards Efficient Data Valuation Based on the Shapley Value

About

"How much is my data worth?" is an increasingly common question posed by organizations and individuals alike. An answer to this question could allow, for instance, fairly distributing profits among multiple data contributors and determining prospective compensation when data breaches happen. In this paper, we study the problem of data valuation by utilizing the Shapley value, a popular notion of value which originated in cooperative game theory. The Shapley value defines a unique payoff scheme that satisfies many desiderata for the notion of data value. However, the Shapley value often requires exponential time to compute. To meet this challenge, we propose a repertoire of efficient algorithms for approximating the Shapley value. We also demonstrate the value of each training instance for various benchmark datasets.

Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gurel, Bo Li, Ce Zhang, Dawn Song, Costas Spanos• 2019

Related benchmarks

TaskDatasetResultRank
Image ClassificationFashion MNIST
Accuracy86.8
300
Data ValuationUCI ADULT (train)
p10.22
5
Data ValuationSynthetic n=3,000
Wall-clock Time (s)1.07e+3
5
Data ValuationUCI Adult
Wall-clock Time (min)48
5
Data ValuationFashion MNIST
Wall-clock Time (hr)3.6
5
Data ValuationCriteo-1B
Wall-clock Time (hr)29.1
5
Data Shapley Value EstimationMNIST
Execution Time2.12e+5
5
Data Shapley Value EstimationIris
Time1.45e+3
5
Data Shapley Value EstimationBreast cancer
Time1.02e+5
5
Data Shapley Value EstimationCora
Time2.01e+5
5
Showing 10 of 13 rows

Other info

Follow for update