GPT Understands, Too
About
Prompting a pretrained language model with natural language patterns has been proved effective for natural language understanding (NLU). However, our preliminary study reveals that manual discrete prompts often lead to unstable performance -- e.g., changing a single word in the prompt might result in substantial performance drop. We propose a novel method P-Tuning that employs trainable continuous prompt embeddings in concatenation with discrete prompts. Empirically, P-Tuning not only stabilizes training by minimizing the gap between various discrete prompts, but also improves performance by a sizeable margin on a wide range of NLU tasks including LAMA and SuperGLUE. P-Tuning is generally effective for both frozen and tuned language models, under both the fully-supervised and few-shot settings.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Commonsense Reasoning | Common Sense Reasoning Tasks | Avg Score18.11 | 241 | |
| Mathematical Reasoning | GSM8K | Accuracy2.65 | 57 | |
| Binary Classification | GLUE (test) | QNLI Accuracy58.8 | 25 | |
| Dialogue Generation | ConvAI2 | BLEU1.5 | 24 | |
| Sentiment Classification | SST-5 32 samples | Accuracy40.9 | 11 | |
| Sentiment Classification | SST-2 32 samples | Accuracy87.6 | 11 |