Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

2D-Shapley: A Framework for Fragmented Data Valuation

About

Data valuation -- quantifying the contribution of individual data sources to certain predictive behaviors of a model -- is of great importance to enhancing the transparency of machine learning and designing incentive systems for data sharing. Existing work has focused on evaluating data sources with the shared feature or sample space. How to valuate fragmented data sources of which each only contains partial features and samples remains an open question. We start by presenting a method to calculate the counterfactual of removing a fragment from the aggregated data matrix. Based on the counterfactual calculation, we further propose 2D-Shapley, a theoretical framework for fragmented data valuation that uniquely satisfies some appealing axioms in the fragmented data context. 2D-Shapley empowers a range of new use cases, such as selecting useful data fragments, providing interpretation for sample-wise data values, and fine-grained data issue diagnosis.

Zhihong Liu, Hoang Anh Just, Xiangyu Chang, Xi Chen, Ruoxi Jia• 2023

Related benchmarks

TaskDatasetResultRank
Point-level mislabeled data detectiongas_drift
AUCPR88
7
Point-level mislabeled data detectionjannis
AUCPR19
7
Point-level mislabeled data detectionElectricity
AUCPR20
7
Point-level mislabeled data detectionfried
AUCPR0.34
7
Point-level mislabeled data detection2Dplanes
AUCPR44
7
Point-level mislabeled data detectioncreditcard
AUCPR20
7
Point-level mislabeled data detectionPOL
AUCPR29
7
Point-level mislabeled data detectionMiniboone
AUCPR0.36
7
Point-level mislabeled data detectionlawschool
AUCPR46
7
Point-level mislabeled data detectionnomao
AUCPR0.33
7
Showing 10 of 24 rows

Other info

Follow for update