Context-Aware Meta-Learning
About
Large Language Models like ChatGPT demonstrate a remarkable capacity to learn new concepts during inference without any fine-tuning. However, visual models trained to detect new objects during inference have been unable to replicate this ability, and instead either perform poorly or require meta-training and/or fine-tuning on similar objects. In this work, we propose a meta-learning algorithm that emulates Large Language Models by learning new visual concepts during inference without fine-tuning. Our approach leverages a frozen pre-trained feature extractor, and analogous to in-context learning, recasts visual meta-learning as sequence modeling over datapoints with known labels and a test datapoint with an unknown label. On 8 out of 11 meta-learning benchmarks, our approach -- without meta-training or fine-tuning -- exceeds or matches the state-of-the-art algorithm, P>M>F, which is meta-trained on these benchmarks. Our code is available at https://github.com/cfifty/CAML.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Few-shot classification | tieredImageNet (test) | Accuracy98.1 | 282 | |
| Image Classification | MiniImagenet | Accuracy96.2 | 206 | |
| Few-shot classification | CUB (test) | Accuracy97.1 | 145 | |
| Few-shot Image Classification | miniImageNet (test) | Accuracy98.6 | 111 | |
| Few-shot Image Classification | tieredImageNet (test) | Accuracy98.1 | 86 | |
| Few-shot classification | CIFAR FS (test) | Mean Accuracy85.5 | 51 | |
| Few-shot classification | ChestX (test) | Accuracy22.2 | 46 | |
| Few-shot classification | meta-iNat (test) | Accuracy96.3 | 34 | |
| Few-shot classification | tiered meta-iNat (test) | Accuracy91.6 | 34 | |
| Few-shot Image Classification | Aircraft (test) | Mean Accuracy79.1 | 28 |