Unsupervised learning of object landmarks by factorized spatial embeddings
About
Learning automatically the structure of object categories remains an important open problem in computer vision. In this paper, we propose a novel unsupervised approach that can discover and learn landmarks in object categories, thus characterizing their structure. Our approach is based on factorizing image deformations, as induced by a viewpoint change or an object deformation, by learning a deep neural network that detects landmarks consistently with such visual effects. Furthermore, we show that the learned landmarks establish meaningful correspondences between different object instances in a category without having to impose this requirement explicitly. We assess the method qualitatively on a variety of object types, natural and man-made. We also show that our unsupervised landmarks are highly predictive of manually-annotated landmarks in face benchmark datasets, and can be used to regress these with a high degree of accuracy.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Landmark Localization | AFLW (test) | NME (%)10.53 | 54 | |
| Landmark Prediction | MAFL (test) | Mean Error (%)5.33 | 38 | |
| Facial Landmark Detection | MAFL (test) | Normalised MSE (%)6.67 | 30 | |
| Landmark Regression | MAFL (test) | MSE (%)6.67 | 28 | |
| Landmark Regression | wild CelebA (test) | Mean Normalized L2 Error31.3 | 17 | |
| Landmark Detection | CelebA Wild (K=8) (test) | Normalized L2 Distance (%)31.3 | 14 | |
| Landmark Prediction | 300-W (test) | Landmark Prediction Error9.3 | 12 | |
| Landmark Prediction | Cat head (test) | Mean Error (%)0.2676 | 10 | |
| Landmark Prediction | AFLW (test) | Mean Error (%)8.8 | 10 | |
| Landmark Detection | MAFL (test) | Inter-ocular Distance Error (%)6.67 | 10 |