Learning from One and Only One Shot
About
Humans can generalize from only a few examples and from little pretraining on similar tasks. Yet, machine learning (ML) typically requires large data to learn or pre-learn to transfer. Motivated by nativism and artificial general intelligence, we directly model human-innate priors in abstract visual tasks such as character and doodle recognition. This yields a white-box model that learns general-appearance similarity by mimicking how humans naturally ``distort'' an object at first sight. Using just nearest-neighbor classification on this cognitively-inspired similarity space, we achieve human-level recognition with only $1$--$10$ examples per class and no pretraining. This differs from few-shot learning that uses massive pretraining. In the tiny-data regime of MNIST, EMNIST, Omniglot, and QuickDraw benchmarks, we outperform both modern neural networks and classical ML. For unsupervised learning, by learning the non-Euclidean, general-appearance similarity space in a $k$-means style, we achieve multifarious visual realizations of abstract concepts by generating human-intuitive archetypes as cluster centroids.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Counting | Counting | Accuracy37.9 | 7 | |
| Indexing | indexing | Answer Rate100 | 4 | |
| max-float | max-float | Answer Rate100 | 4 | |
| max-int | max-int | Answer Rate100 | 4 | |
| min-float | min-float | Answer Rate100 | 4 | |
| min-int | min-int | Answer Rate100 | 4 | |
| stock | Stock | Answer Rate63.3 | 4 | |
| weather | Weather | Answer Rate65.9 | 4 | |
| number-list | number-list | Answer Rate73.4 | 4 | |
| number-string | number-string | Answer Rate96.3 | 4 |