Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Classification from Pairwise Similarities/Dissimilarities and Unlabeled Data via Empirical Risk Minimization

About

Pairwise similarities and dissimilarities between data points might be easier to obtain than fully labeled data in real-world classification problems, e.g., in privacy-aware situations. To handle such pairwise information, an empirical risk minimization approach has been proposed, giving an unbiased estimator of the classification risk that can be computed only from pairwise similarities and unlabeled data. However, this direction cannot handle pairwise dissimilarities so far. On the other hand, semi-supervised clustering is one of the methods which can use both similarities and dissimilarities. Nevertheless, they typically require strong geometrical assumptions on the data distribution such as the manifold assumption, which may deteriorate the performance. In this paper, we derive an unbiased risk estimator which can handle all of similarities/dissimilarities and unlabeled data. We theoretically establish estimation error bounds and experimentally demonstrate the practical usefulness of our empirical risk minimization method.

Takuya Shimada, Han Bao, Issei Sato, Masashi Sugiyama• 2019

Related benchmarks

TaskDatasetResultRank
Pairwise ComparisonCIFAR-100
Accuracy61.22
22
Pairwise ComparisonCIFAR-10
Accuracy63.28
22
Pairwise ComparisonSTL-10
Accuracy67.05
22
Binary ClassificationPMU-UD UCI (test)
Accuracy94.4
17
Binary ClassificationOptdigits UCI (test)
Accuracy87.1
17
Binary ClassificationLetter UCI (test)
Accuracy69.4
17
Binary ClassificationPendigits UCI (test)
Accuracy82.8
17
Binary ClassificationFashion (test)
AUC96.6
13
Binary ClassificationPMU-UD (test)
AUC0.975
13
Binary ClassificationMNIST (test)
AUC95.2
13
Showing 10 of 14 rows

Other info

Follow for update