Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models
About
Prompting language models (LMs) with training examples and task descriptions has been seen as critical to recent successes in few-shot learning. In this work, we show that finetuning LMs in the few-shot setting can considerably reduce the need for prompt engineering. In fact, one can use null prompts, prompts that contain neither task-specific templates nor training examples, and achieve competitive accuracy to manually-tuned prompts across a wide range of tasks. While finetuning LMs does introduce new parameters for each downstream task, we show that this memory overhead can be substantially reduced: finetuning only the bias terms can achieve comparable or better accuracy than standard finetuning while only updating 0.1% of the parameters. All in all, we recommend finetuning LMs for few-shot learning as it is more accurate, robust to different prompts, and can be made nearly as efficient as using frozen LMs.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Subjectivity Classification | Subj | Accuracy81.8 | 266 | |
| Sentiment Analysis | CR | Accuracy89.9 | 123 | |
| Paraphrase Detection | MRPC | Avg Accuracy63.9 | 89 | |
| Word Sense Disambiguation | WiC | Avg Accuracy52.4 | 84 | |
| Natural Language Inference | CB | Average Accuracy91 | 29 | |
| Natural Language Inference | RTE | Avg Accuracy64.4 | 21 | |
| Sentiment Analysis | MR | Avg Accuracy84.9 | 11 | |
| Sentiment Analysis | SST-2 | Average Accuracy89.8 | 8 | |
| Sentiment Analysis | SST-5 | Avg Acc45.7 | 8 | |
| Paraphrase Detection | QQP | Average Accuracy70.4 | 8 |