Topological Metric for Unsupervised Embedding Quality Evaluation
About
Modern representation learning increasingly relies on unsupervised and self-supervised methods trained on large-scale unlabeled data. While these approaches achieve impressive generalization across tasks and domains, evaluating embedding quality without labels remains an open challenge. In this work, we propose Persistence, a topology-aware metric based on persistent homology that quantifies the geometric structure and topological richness of embedding spaces in a fully unsupervised manner. Unlike metrics that assume linear separability or rely on covariance structure, Persistence captures global and multi-scale organization. Empirical results across diverse domains show that Persistence consistently achieves top-tier correlations with downstream performance, outperforming existing unsupervised metrics and enabling reliable model and hyperparameter selection.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Collaborative Filtering | MovieLens-20M User (test) | Pearson R0.778 | 16 | |
| Collaborative Filtering | MovieLens Item 20M (test) | Pearson R0.893 | 8 | |
| Embedding Quality Evaluation | Behavioral modeling | Pearson Correlation0.861 | 8 | |
| Optimal epoch selection | Financial analytics Gender and Age | Pearson Correlation Coefficient0.691 | 8 | |
| Embedding Quality Evaluation | Financial analytics | Pearson Corr0.671 | 8 |