Efficient Data Shapley for Weighted Nearest Neighbor Algorithms
About
This work aims to address an open problem in data valuation literature concerning the efficient computation of Data Shapley for weighted $K$ nearest neighbor algorithm (WKNN-Shapley). By considering the accuracy of hard-label KNN with discretized weights as the utility function, we reframe the computation of WKNN-Shapley into a counting problem and introduce a quadratic-time algorithm, presenting a notable improvement from $O(N^K)$, the best result from existing literature. We develop a deterministic approximation algorithm that further improves computational efficiency while maintaining the key fairness properties of the Shapley value. Through extensive experiments, we demonstrate WKNN-Shapley's computational efficiency and its superior performance in discerning data quality compared to its unweighted counterpart.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mislabel Detection | 2Dplanes | AUROC0.24 | 17 | |
| Mislabel Detection | vehicle | AUROC0.11 | 17 | |
| Mislabel Detection | Wind | AUROC15 | 17 | |
| Mislabel Detection | apsfail | AUROC0.22 | 17 | |
| Mislabel Detection | CPU | AUROC10 | 17 | |
| Mislabel Detection | Fraud | AUROC11 | 17 | |
| Mislabel Detection | POL | AUROC0.05 | 17 | |
| Misclassification Detection | CIFAR10 | AUROC19 | 14 | |
| Identifying mislabeled points | 2Dplanes | Recall50 | 12 | |
| Identifying mislabeled points | 2Dplanes | Precision25 | 12 |