Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Classification Logit Two-sample Testing by Neural Networks

About

The recent success of generative adversarial networks and variational learning suggests training a classifier network may work well in addressing the classical two-sample problem. Network-based tests have the computational advantage that the algorithm scales to large samples. This paper proposes a two-sample statistic which is the difference of the logit function, provided by a trained classification neural network, evaluated on the testing set split of the two datasets. Theoretically, we prove the testing power to differentiate two sub-exponential densities given that the network is sufficiently parametrized. When the two densities lie on or near to low-dimensional manifolds embedded in possibly high-dimensional space, the needed network complexity is reduced to only scale with the intrinsic dimensionality. Both the approximation and estimation error analysis are based on a new result of near-manifold integral approximation. In experiments, the proposed method demonstrates better performance than previous network-based tests using classification accuracy as the two-sample statistic, and compares favorably to certain kernel maximum mean discrepancy tests on synthetic datasets and hand-written digit datasets.

Xiuyuan Cheng, Alexander Cloninger• 2019

Related benchmarks

TaskDatasetResultRank
Two-sample testingGaussian mixture data Synthetic Example 1 d=10
Test Power100
44
Two-sample testHiggs alpha=0.05 (test)
Test Power98.5
42
Two-sample testMNIST Real vs DCGAN samples (test)
Test Power100
36
Distribution Shift DetectionCIFAR-10 vs CIFAR-10.1
Average Rejection Rate0.529
27
Two-sample testingGaussian mixture data Example 2 d=10 (test)
Test Power (n_tr=500)52.2
9
Two-sample testCIFAR-10 vs CIFAR-10.1 1.0 (test)
Mean Rejection Rate0.529
6
Showing 6 of 6 rows

Other info

Follow for update