PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models
About
Current methods for few-shot fine-tuning of pretrained masked language models (PLMs) require carefully engineered prompts and verbalizers for each new task to convert examples into a cloze-format that the PLM can score. In this work, we propose PERFECT, a simple and efficient method for few-shot fine-tuning of PLMs without relying on any such handcrafting, which is highly effective given as few as 32 data points. PERFECT makes two key design choices: First, we show that manually engineered task prompts can be replaced with task-specific adapters that enable sample-efficient fine-tuning and reduce memory and storage costs by roughly factors of 5 and 100, respectively. Second, instead of using handcrafted verbalizers, we learn new multi-token label embeddings during fine-tuning, which are not tied to the model vocabulary and which allow us to avoid complex auto-regressive decoding. These embeddings are not only learnable from limited data but also enable nearly 100x faster training and inference. Experiments on a wide range of few-shot NLP tasks demonstrate that PERFECT, while being simple and efficient, also outperforms existing state-of-the-art few-shot learning methods. Our code is publicly available at https://github.com/facebookresearch/perfect.git.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Subjectivity Classification | Subj | Accuracy89.1 | 266 | |
| Sentiment Analysis | CR | Accuracy90 | 123 | |
| Paraphrase Detection | MRPC | Avg Accuracy67.8 | 89 | |
| Word Sense Disambiguation | WiC | Avg Accuracy53.8 | 84 | |
| Natural Language Inference | CB | Average Accuracy90.3 | 29 | |
| Natural Language Inference | RTE | Avg Accuracy60.7 | 21 | |
| Sentiment Analysis | MR | Avg Accuracy86.3 | 11 | |
| Paraphrase Detection | QQP | Average Accuracy71.2 | 8 | |
| Question Classification | TREC | Average Accuracy90.6 | 8 | |
| Sentiment Analysis | SST-2 | Average Accuracy90.9 | 8 |