Classification Logit Two-sample Testing by Neural Networks
About
The recent success of generative adversarial networks and variational learning suggests training a classifier network may work well in addressing the classical two-sample problem. Network-based tests have the computational advantage that the algorithm scales to large samples. This paper proposes a two-sample statistic which is the difference of the logit function, provided by a trained classification neural network, evaluated on the testing set split of the two datasets. Theoretically, we prove the testing power to differentiate two sub-exponential densities given that the network is sufficiently parametrized. When the two densities lie on or near to low-dimensional manifolds embedded in possibly high-dimensional space, the needed network complexity is reduced to only scale with the intrinsic dimensionality. Both the approximation and estimation error analysis are based on a new result of near-manifold integral approximation. In experiments, the proposed method demonstrates better performance than previous network-based tests using classification accuracy as the two-sample statistic, and compares favorably to certain kernel maximum mean discrepancy tests on synthetic datasets and hand-written digit datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Two-sample testing | Gaussian mixture data Synthetic Example 1 d=10 | Test Power100 | 44 | |
| Two-sample test | Higgs alpha=0.05 (test) | Test Power98.5 | 42 | |
| Two-sample test | MNIST Real vs DCGAN samples (test) | Test Power100 | 36 | |
| Distribution Shift Detection | CIFAR-10 vs CIFAR-10.1 | Average Rejection Rate0.529 | 27 | |
| Two-sample testing | Gaussian mixture data Example 2 d=10 (test) | Test Power (n_tr=500)52.2 | 9 | |
| Two-sample test | CIFAR-10 vs CIFAR-10.1 1.0 (test) | Mean Rejection Rate0.529 | 6 |