TUSK: Task-Agnostic Unsupervised Keypoints
About
Existing unsupervised methods for keypoint learning rely heavily on the assumption that a specific keypoint type (e.g. elbow, digit, abstract geometric shape) appears only once in an image. This greatly limits their applicability, as each instance must be isolated before applying the method-an issue that is never discussed or evaluated. We thus propose a novel method to learn Task-agnostic, UnSupervised Keypoints (TUSK) which can deal with multiple instances. To achieve this, instead of the commonly-used strategy of detecting multiple heatmaps, each dedicated to a specific keypoint type, we use a single heatmap for detection, and enable unsupervised learning of keypoint types through clustering. Specifically, we encode semantics into the keypoints by teaching them to reconstruct images from a sparse set of keypoints and their descriptors, where the descriptors are forced to form distinct clusters in feature space around learned prototypes. This makes our approach amenable to a wider range of tasks than any previous unsupervised keypoint method: we show experiments on multiple-instance detection and classification, object discovery, and landmark detection-all unsupervised-with performance on par with the state of the art, while also being able to deal with multiple instances.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Landmark Regression | wild CelebA (test) | Mean Normalized L2 Error18.49 | 17 | |
| Object Detection | MNIST-Hard (test) | Localization Accuracy99.9 | 5 | |
| Landmark Detection | Human3.6M (test) | Normalized Error6.88 | 4 | |
| Object Discovery | CLEVR6 (test) | ARI0.983 | 4 | |
| Object Discovery | Tetrominoes (test) | ARI99.7 | 3 | |
| Unsupervised Property Classification | CLEVR6 (test) | Shape Acc46.8 | 1 | |
| Unsupervised Property Classification | Tetrominoes (test) | Shape Acc91.3 | 1 |